An important first step for developing a data model is taking the time to learn about the data and how the data relates.
Remember not all data is useful. While you may wish to include data that does not directly relate to the problem, this can quickly become problematic as your data model can grow into an unmanageable mess.
Keep in mind, a data model should work for:
- A set of assumptions
- Function within the limitations of the problem scope
Lets start with an example of a soccer (football for all my non-US friends) club data model. Here is an example of a class you could build for holding team data:
The attributes are used to capture data about each team.
But let’s look at a new problem we haven’t really discussed yet:
Look at the AgeGroup above. If I asked you what teams are in the U-12 age group, as a human you could look at the table above and tell me that Rage and Hurricanes are. However, if you tried running a query for U-12, it would only return Rage, as U – 12 and U-12 are viewed as completely different terms by a computer.
To prevent this, one approach could be to create a new class
Now, U-12 will only appear once in the Agegroup table, this removes the risk of someone typing it in differently like in the table before. Integrity of data accuracy is something to consider when deciding how many tables to create.
Now lets look at the issue of team captains, considering a team captain is also a player. The diagram below shows there are 2 relationships between Player and Team classes. This is perfectly okay.
•Do you want to select objects based on the value of an attribute? Then you may want to introduce a class for that information.
(Ex: you want to see all teams that are in age group “U-12”)
•Do you need to store other data about this information? Then you may want to introduce a class for that information. (Ex: We are storing the team captain’s name but also want his/her email and phone number)
•Are you already storing similar information? Then you may want to use a relationship among existing classes. (Ex: the information about team captain is the same as the information about the players, so use that class with a new relationship)
Multiple Companies in One Building
Now consider the example of a building housing multiple companies. While the diagram below is not completely incorrect, I will argue against the relationship between Employee and Room. In this example, it appears that you can infer the Employee location through the Company-Room relationship. While having multiple route for data isn’t wrong, make sure they convey different information.
•Each employee belongs to one division
•Divisions are made up of many Groups
•The problem here is if you try to infer something that was not intended
•You know it’s a fan trap when you have 2 relationships with many cardinality on the outside ends
They way the data model is written, a division can have many different employees and a division can also belong to many different groups. So trying to determine what group an employee belongs to via their division is impossible in this data model.
•Each employee can belong to at most 1 group
•Each group belongs to 1 and only 1 division
•Divisions are made up of many Groups
•Can you answer the question, “What division does each employee belong to?”
•You know it’s a chasm trap when the connection is not always there or there is a gap in a route between classes
•Ann doesn’t belong to a group since the optionality is 0
•We can only determine division based on group assigned
•So we have no idea what division Ann is in
Multiple Routes Between Classes:
•Whenever there is a closed loop, check to see if the same information is being stored more than once (don’t be redundant).
•Make sure you are not inferring more than you should from a route. Always look out for the case when a class is related to two other classes with a cardinality of many at both outer ends.
•Ensure that a path is available for all objects. Are there optional relationships along the route?
You an even have a Self Relationship
•Say that a club requires an existing member to sponsor any new members
•You wouldn’t have a class for member and a class for sponsor because they have the same data
•You can represent this type of situation with a self relationship because objects of a class can be related to each other
One thought on “Database Design: Developing a Data Model”
Pingback: Database Development and Design – Analytics4All