Database Design: The Model

When discussing database design, the Model refers to the mock up we create for handling our data. This can be just a simple diagram you draw our or in larger databases, multiple pages of inter-related ERD (Entity – Relationship – Diagrams)

Often times, the model is developed in response to a use case or real world problem. So we are going to look at couple of examples here and see how a model is made to solve a real world problem:

First one, let’s look at a hotel geared towards groups of high school students. They want a way to keep track of which rooms the guests are staying in, as well as what group the guests belong to. They also want to have in the database a way to track which adults are associated with each group

So we create 3 classes, Group, Guest, and Room. Both Room and Group can only exist once. Room 2 is a Room 2, you can’t have 2 Room 2s.

Multiple Guests can be part of a Group though

One quick to identify problem with this model is, what happens if one person from a group leaves early. This model can not handle that exception

Optionality and Cardinality

Optionality and Cardinality are related to how many instances of an item can exist in Class (or table once its in the database)

Optionality is the smallest number of items that can exist. It represented by the first number in the grouping.

Cardinality is based on how many total objects the other side can associate with.

In the case below, A student can enroll in multiple courses and a course can have multiple students into. At the same time, a student could be enrolled in 0 courses, and it is possible to have a instance where there are no students in the school at all – so the Optionality and Cardinality will look like this.

What we have created here, is a many to many relationship. That is due to the fact that both students can be in more than one course and courses can have more than one student.

To get around the problem of many to many relationships, database designers need to implement intermediate classes or (lookup tables). This creates Course and Student classes that will not have duplicates, and an intermediate class that allows you to keep track of all the iterations of student and course combinations

With a Start and End date added to the Roster class, you now have an historical class that can track when a student took a particular class, even solving the issue of what happens if a student has to take the class again, or if the student drops the class.

MS Access: Query two related tables

The entire purpose of related tables is that they allow you to query information from multiple tables at once. In this example we will be creating a query that looks at the Class and Teacher tables we built in the intro to MS Access Lesson: link to lesson

To start, we will select Create from the Menu bar and Query Design from the Ribbon

Next drag the Teacher and Class table over to the blank space for Query1

down below you can select the columns (fields) you want to bring in from the tables.

Next, put all the all the available columns in the Fields below

You’ll notice the Fields all have their table names in front of them in the drop down. This is common SQL notation for <table>.<field>

Right click on the query tab and select datasheet view.

Note the two tables are matched up by the teacherID in two different tables. That is the relationship. That is how relational database allow tables to interact

Lets remove the teacherID from the query

Go back to design view and uncheck the columns to hide them from the query

To see the results, click on the Query1 tab and select Datasheet view

So you can see, our query returned information found in 2 different tables

Now, let’s right click on Query1 tab and select SQL View

This is how you would write this query using SQL

Select Class.ClassID, Class.Class_NM, teacher.teacher_name — This means that we want to see these three columns. Note the table name is in front, followed by the column or field name, separated by a ‘.’ This is common practice in SQL. It tells the database which table the field is in. And in situations like the teacherID column that is found in both tables, it clarifies which one you want.

from teacher INNER JOIN Class ON teacher.teacherID= Class.TeacherID; — this is a typical join statement. It says use both teacher and Class tables, and match the records up using the TeacherID field.

For more information in Joins, check my SQL Intro to Joins and SQL 4 Types of Joins

Back to Main Course Page: Course

Excel Data Cleaning: Split name column into First and Last at comma

So my wife works at a high school and had an issue trying to upload an Excel file to a database. She asked me if I could look at the file and fix any errors.

**Note the names below have been completely made up, these are not real students. I have not and would not share any private information on my site

You can download the file below to play along:

Okay, so the database she is trying to upload the file too has request the following columns:

ID > Last_Name > First_Name > Grade > Email

**For my non US readers, in the United States we commonly refer to the year you are in school as the grade, so Grade 12 is the final year for high school students.

First thing first, lets split our Student column into first and last name

  1. Let’s make some space, right click grade column and click insert to add a new column

Now, click on student column, select Data from the menu bar, and Text to Columns from the Ribbon bar

Note all the names are different lengths and some students have two last names, while others have a first and middle name. So our best bet is to split this by the delimiter (in this case the comma)

Select comma from the choices, you will see your data preview window what the results will look like

I don’t need to convert data types, so I just ignore this window > Hit Finish

Now all you have to do is change the column names

Next, lets check for bad email addresses. One good thing about email addresses in a single organization is they are often all the same after the @ sign, so we can use the RIGHT() function to count backwards

A quick count of characters (make sure to include the period and @) tells me that starting from the end of the email address, there are 11 characters up to and including the @

So my formula would be =right(E2,11)

With this nice short file, it is easy to see that the forth student has a mistake in his email address.

In the real world, where you have 100’s or even 1000’s of students in a school, it might be easier to use an IF statement to more easily tease out bad emails. Using the code below, we have nested our Right() function inside an IF statement. And we will return 0 for all good emails, 1 for a bad one

=IF(Right(E2,11)=”@school.edu”,0,1)

Even with long lists, the 1 usually jumps off the page against the rows and rows of 0s

Finally, files can some times have spaces before or after data in the cells, and this can mess up an upload to a database sometimes.

We can simply use the Trim function to get rid of extra whitespace.

Run Trim in a new column > Then right click and copy the whole column

Go back to your original column > Paste > Paste Special…>Paste Values (the blue arrow). That will paste just the data trimmed of whitespace, not the formula.

Once you are done, delete the added columns you were working with for the IF() and Trim() statements, make sure your columns are properly named, and you are good to go.

MS Access: Create a Calculated Field

Calculated fields are exactly what they sound like, fields created by a calculation. To make up, let’s open up the Kmeans Table. Go to the Table Fields tab and click on More Fields > Calculated Field > Number

Select AVG Labor and Labor Cost from the Expression Categories. Let’s multiply the two values, put a * in between them

A new field will appear, you can rename the field if you want.

If you look at the table in Design view, you can examine the calculated field another way

Back to Main Course Page: Course

MS Access: Intro to Queries

Queries are how you search and return data from a database. In this example we will be using data loaded into Access from my first two Access Lessons:

If you have the tables loaded, lets move onto the Query Wizard. Click on Create then Query Wizard

Select Simple Query Wizard and Okay

Select Kmeans as your table

Select the >> to move all the fields over to Selected Fields Window

Hit Next > Leave Detail selected and hit Next

You can title your query if you want. Then hit finish

Your results with now appear

Right click on your query in the side bar and select Design View

Filtering Query

With Design View open, lets uncheck the ID column and in the Criteria Spot under Model, type in “IE33” with quote marks

Now click on your query in the left bar and your results with appear

Right click query and go back to designer. Change “IE33” under Model to “A*” (* are wildcards meaning return anything that starts with A

Double click Kmeans Query and view results

Back to Main Course Page: Course

Database Design: Initial Requirements and Use Cases

The goal of good database design should be to provide a solution to real world problem. In order to create a solution though, you first need to to understand the problem at a granular level.

One approach that I like to take:

  • Break down tasks users perform
  • Ensure you have the data needed for those tasks

Keep in mind:

•Understanding a problem involves looking at both the input and output tasks

•The data model goes hand in hand with the use cases

•Continue to re-evaluate whether the data model can handle the input and output use cases

•Most of the time, users won’t have this readily available and it needs to come through discussion

•Users can be very narrowly focused, it’s your job as the designer not to be

Let’s imagine an example of a meal delivery service where people call in to a receptionist who takes the order and dispatches drivers to delivery the meals

We need to figure out:

1.Who are the users?

2.What tasks do they do?

3.What data is needed?


First make sure you understand the process:

Users and Tasks:

•Visitor calls the company

•Receptionist takes the order

•Receptionist selects/assigns a driver

•Receptionist gives order to the driver

•Driver picks up meals

•Driver delivers meals

•Visitor pays the driver

•Driver informs the company

•Receptionist records the completed order

•Driver maintains timesheet


Then decide what tasks our system will be involved with?

Users and Tasks:

Visitor calls the company (probably not)

•Receptionist takes the order (yes – need to record some details)

•Receptionist selects/assigns a driver (yes – need to see who is available)

•Receptionist gives order to the driver (yes – need to be able to see the order details)

•Driver picks up meals (maybe – do we need to keep track of time of pickup?)

•Driver delivers meals (maybe – do we need to keep track of delivery time?)

Visitor pays the driver (probably not)

Driver informs the company (maybe – is the driver entering details himself or calling?)

•Receptionist records the completed order (yes)

•Driver maintains timesheet (yes)


Then we can create some basic tables and relationships based on this imformation above

The Driver and Order interact as the driver needs to get an order to deliver, and the driver and timesheet interact as the driver needs to fill in the time sheet, but the Timesheet and Order do not interact though, so there is no relationship there.

Next, we can fill in some attributes

This basic model above should meet the needs of the use case described

Sample of steps to take when working through the analysis phase:

•What does the user do?

•What data is involved in the user’s jobs?

•What is the main objective of the system?

•What data is needed to satisfy the main objective?

•What are the input use cases?

•What is the data model? 

•Can it handle everything so far?

•What are the output use cases?

•Can the data model provide everything so far?

Back to Main Course Page: Course

MS Access: Import Excel File, Sort and Filter Data

In this lesson, we will be importing an Excel file into MS Access and learning to use the sort and filter functionality. If you want to follow along, download the Excel file below:

I am using the same database I build in the previous lesson. You can start a new one if you want, but since most of these early lessons will be sandboxing( a programming term for playing around with a software platform) in Access it doesn’t really matter if you just use the same database for everything.

From the Home screen in Access, click on the External Data tab and open up the External Data ribbon

Select New Data Source > From File > Excel

Browse for your excel file, select Import the source data into a new table and click ok (note, if you want to use a different Excel file, you can)

My excel file has column names in the first row, so I make sure that is check and I click next

Up top, you can change column names, and data types, but in this case, Access did a good job of assigning datatypes for me. Next>

I check Let Access add primary key, you could just select your own from the drop down though. Next >

Name your table whatever you want and hit Finish

If you spent a lot of time fixing data types and column names or if you are going to upload a file like this on a regular basis, you can save import steps, but I am going to skip this for now, just hit close

Click on your new table to open it up

Sorting

Filters and sorting work like they do in Excel, here I select my column and Sort Largest to Smallest

Filter

Here is select Model and filter down to a single model (Accuvix A30)

Back to Main Course Page: Course

Database Development and Design

Week 1

  1. Databases: What are they, and why do we need them?

Week 2

  1. Database Design: First Things to Consider
  2. Database Design: The Development Process
  3. SQL: Intro to SQL Select, From, Where, Order by
  4. Lab 1 Walkthrough

Week 3

  1. Database Design: Initial Requirements and Use Cases
  2. SQL: Intro to Joins

Week 4

  1. Database Design: The Model
  2. Data Jobs: Careers in Data

Week 5

  1. Database Design: Developing a Data Model
  2. Database Design: Lab 4 Walkthrough

Week 6

  1. Database Design: Generalization and Specialization
  2. Database Design: SQL Sub Languages
  3. SQL: Create tables and insert data
  4. Lab 5 Walkthrough

MS Access: Intro to Access

MS Access is a all in one database solution provided as part of MS Office. Access was much more popular years ago when more powerful solutions such as Oracle an SQL Server couldn’t be effectively run on desktops. While Access’s time in the sun is definitely in decline, for someone new to concepts of databases, Access can be a great introduction.

Let’s start by creating a simple database. Open up Access and click on Blank Database

Give your blank database a name in the pop up window

By default, you’re new database will start with a Table1 with an ID column. Click the drop-down on the next column space to add a new column. For this example, let us select Short Text as our data type. Short text can handle any text up to 255 characters

After you set the datatype, you can click on the column name and rename it

Let’s add another column, set the datatype to number

Now click the X next to table1 and you will be prompted to name your table

Another way to build a table is through the table design feature

You can name your columns and select data types: Note AutoNumber is an auto incrementing datatype that works to provide you with an auto populating ID number

Access requires primary keys for all tables (something not required, but recommended in other systems like MySQL or SQL Server). To create a Primary Key, right click on ClassID and select Primary Key

Finally, let’s just use SQL to create a table. Select Create an then Query Design from the ribbon menu

Select SQL View

Put in the SQL below

create table teacher (
teacherID int primary key,
teacher_name varchar(255))

Click on the new teacher table in the left and you can fill in some data

Now let’s create a relationship between the Class and Teacher tables. Click on Database Tools > Relationships

Drag the Class and Teacher table into the blank sheet and click Edit Relationships

Click create new, select your tables and columns from the drop down, and select Enforce Referential Integrity and select the Create button

A Relationship line will appear, showing you your connection

Back to Main Course Page: Course

Database Design: The Development Process

When start on the design of a database there are few questions you need to ask right away:

  • Not to be dismissive, but do not simply build a database to the specs of a request
  • Take stock of the data at hand, ask around if there is related data that also might be included
  • Think about what users will be doing with the data
  • As best you can, try to anticipate other uses for the data that come up
  • Focus on the type of data you are dealing with, and how it can be best used/store
  • Don’t only focus on the current use of the data, consider the data may find other uses as well

Now, let’s look at database development through the lens of a software development cycle

Keep in mind, this process is not simply linear. This process, in the real world, goes through many iterations. Change management is good idea to put in place. It helps you to deal with issues like:

  1. New data sources
  2. New requirements from business users
  3. Technology platform changes in your organization

To get started, we need to come up with a problem statement, something that can describe the problem we are trying to solve. Use cases are commonly used when developing problem statements. Use cases help to demonstrate how a user will be interacting with your database:

  • Transactional – think like a cash register, adding new transactions to database, updating inventor
  • Reporting base – data doesn’t change often, people want to look more at aggregate numbers over the details of each transaction. (End of the day reporting – what is the final sales total from the cash register)

You can use Unified Modeling Language (UML) to create your use case. UML uses a series of diagramming techniques to help visualize system design. Below, the stick figure represents users and each oval will show one of the tasks the user hopes to be able to use the database for. UML is great for mock-ups, but you should also include a detailed document that provides deeper insight into the tasks below.

Now imagine we want to create a database to store plant data:

Using this data, we can come up with a few basic use cases

Use case 1: Enter (or edit) all the data we have about each plant; that is, plant ID, genus, species, common name, and uses.

Use case 2: Find or report information about a plant (or every plant) and see what it is useful for.

Use case 3: Specify a use and find the appropriate plants (or report for all uses). 

How does our existing table design handle our use cases:

1.Can we maintain data?  Yes

2.Given a plant can we return the uses?  Yes

3.Given a use can we return the plant?  Not really

We need to re-examine this data from a class point of view

We are dealing with 2 classes in the Plant data:

1.Plant

2.Use

•One plant can have many uses

•This is an example of a relationship between classes

•Relationships between classes are represented with a line in UML

•We need more details besides just a line to know how classes relate

•The pair of numbers at each end of the line indicates how many objects of one class can be associated with a particular object of the other class 

•The first number is the minimum number (0 or 1 to indicate whether there must be a related object)

•The second number is the greatest number of related objects (1 or n to represent many)

Design Phase

•You spend a lot of time in the top 2 quadrants

•It is an iterative process, not simply linear

•Keep reviewing the model to ensure it satisfies the problem

•Once you have a model that works, move on to design

•For the database world, design involves:

Converting our Class Diagram into Database Objects: Tables, Keys, and Relationships

Application Phase

•Now that you have a database foundation in place, the Application phase builds upon it by adding additional functionality for the users

•The Application phase satisfies the Use Cases:

Input Use Cases will most likely be satisfied with Forms

Output Use Cases will most likely be satisfied with Reports

Back to Main Course Page: Course