Relational Database
Contents
Introduction
The relational database is a simple concept with powerful results. Indeed, all data can be easily stored in a relational database.
Core Concepts
Database, Tables, Columns, Rows
The relational database is defined as simply a universe of tables. Each table has columns and rows. Each column in a table has a name and a type, describing what data will appear in each row. Each row has data for each column.
Operations
With this simple system, four basic operations are identified:
- SELECT: Gather data from one or more tables. The SELECT identifies which tables are used, how they are combined, which rows are filtered out or ignored, and how the data is manipulated. SELECT yields a new table with columns and rows derived from the underlying tables.
- INSERT: Add rows to a table.
- UPDATE: Modify rows in a table. (Really, a DELETE and INSERT.)
- DELETE: Remove rows from a table.
Keys
Keys are one or more columns that can be used to identify unique rows in a table. For instance, a table might have a column "id" which is supposed to contain a unique number for each row. This column is a key since it and it alone can be used to find individual rows.
Multiple column keys work in a similar way. For instance, there might be a key based on "city" and "state". In this way, "Bellevue" "Washington" is identified as unique to "Seattle" "Washington" even though they have the same state.
Primary Keys
Primary keys are the preferred and natural key to use for identifying rows. The ideal characteristics of the primary key are:
- That it is a single column. Multiple-column keys are harder to specify than single-column ones.
- That it contains a perfectly meaningless value, other than as a key. For instance, using a name which may change would make a poor primary key.
- That it is of the simplest type possible for the fastest lookup speed and minimal storage requirements.
Foreign Keys
Foreign Keys are not keys. They are columns that refer to keys. For instance, we may have a table "cities" that lists all the cities. It may have two columns, "name" and "state" that are a key on that table. If another table had columns "city_name" and "city_state" that referred to a row in the "cities" table, then those columns would be the foreign key to that table's columns.
Foreign keys can refer back to the same table in a recursive relationship.
NULL
NULL is a concept that all implementations have in common, although it strictly isn't part of relational algebra. (It would simply be considered another value.)
NULL is intended to be the "I don't know" value. Doing any operation on it should yield NULL since "one" plus "I don't know" is "I don't know".
Constraints
Relational databases, at least the implementations, allow you to specify constraints on what type of data is allowed in each row. These may be unique constraints (allowing you to enforce that certain columns are indeed keys) to NOT NULL constraints ensuring that NULL is not allowed in a column. It may also be any kind of constraint imaginable, such as what specific values are allowed.
Indexes
Indexes are tables that have organized their data to make finding particular rows in the source table trivial.
Ideas
Most of my ideas come from the success of Python.
Untyped Columns?
I believe that typing on columns, that is, specifying whether a column is a number or a string or raw data, is an implementation detail. I don't believe it is necessary anymore, although modifying a relational database implementation to drop typing on columns would be tremendously difficult to implement.
Untyped Rows?
What if tables didn't specify the columns on the table? Already, we make some columns optional by allowing "null" values. I believe that the columns themselves are really a constraint on the data.
No constraints?
What if we dropped all constraints altogether? The applications that use the database are expected to behave appropriately on the honor system. Data that doesn't compute is simply ignored or noted.