Data Modelling relationships between tables
The thing that got me writing this article was a question posted on Stack Overflow about data design, asking whether it’s better to use one big table or to arrange the data using multiple related tables. Having spent many years working with relational databases this seems obvious, it’s easy to forget this is not common knowledge and assume everyone knows this stuff.
The examples below are from a database that contains information about portraits belonging to the University of Oxford. These portraits are referred to as Works in the database.
The database also holds details of the people connected to the portraits. This includes the subject of the portrait (the sitter), the artists who produced the portrait, and others such as donors who have bequeathed portraits to the university.
A portrait may be connected to many people and a person may be connected to many portraits, in other words a many-to-many relationship exists between portraits and people.
There are various ways in which this can be modelled:
- Store the portraits and people in a single table. When several people are linked to a portrait can either have multiple sets of columns to hold the extra people or have multiple rows where the portrait data is repeated
- Store the portraits and people in two separate tables with columns in the portrait tables that hold the links to the various people
- Store the portraits and people in separate tables with a third table to hold the relationships
The problem with the first two solutions above is that data is either repeated or is left empty in case it is required as we can’t know up front how many relationships there will be for each portrait and person. They are also hard to change when new requirements are identified, e.g. a new type of person needs to be added.
The third solution doesn’t require repeating (redundant) data or booking spaces for relationships that may or may not be required, using data modelling terms we can say it has been normalised.
The extra table by convention is named from concatenating the names of the two tables in the many-to-many relationship, hence the new table in our example is called Work_People as it links the Work and People tables.
The new table is constructed by taking the Primary Key of each of the parent tables. These become the primary key of the new table. In our example both tables have numeric primary keys that are incremented for each new row. Using these abstract keys allows the data in the tables to be changed without having to update the relationships, e.g. the names in the People table can be changed without having to update the Work_People or Work tables.
Note the relationships in the diagram between the three tables. The connectors that link the tables have a single terminator at the One end of the relationship and have a crows foot at the Many end of the relationship.
There is a One at the One end of the relationship
There is a Zero at the Many end of the relationship.
This is the cardinality of the relationship which means the connector between the Work and Work_People tables can be read as Exactly One row in the Work table can be related to Zero or More Work_People rows.
In other words, new rows can be added to the Work and People tables without having to add relationships to the Work_People table. This allows new works and new people to be added independently of each other, then the relationships are added as and when required. Note that this isn’t always the case, cardinality will vary and in some cases may specify that the relationships must be added at the same time as the rows in the main tables.