Database Design: Lab 4 Walkthrough

I made these video walkthroughs as an alternative to following the lab in the text book. I know some people (myself included) learn better from watching videos.

This is a walkthrough for Lab 1 for my course on Database Development and Design. Feel free to watch video, but I will not be sharing any files as they were not created by me and I do not have permission to share them.

Database Design: Generalization and Specialization

•How do we handle the situation when we have multiple classes but realize the classes have a significant amount of information in common?

•How do we handle the situation when we have a single class but realize that there are differences among the different objects in that class that may drive us to break up the class into two classes?

Specialization

When the majority of the information about two types of objects are the same, but there exists some different specialized data

•Let’s say this class already exists for a small startup company

•All employees have an ID, firstName, lastName, and salary

•The startup has grown enough that they now want to hire consultants

•Instead of salary, consultants have an hourly rate

•Subclasses (or inherited classes) contain the specialized information

•Permanent and Consultant are both subclasses of Employee

•Superclasses (top classes) should be as general as possible

•Future changes to the superclass would affect all subclasses

•Easy to add additional subclasses

Generalization

When you have 2 or more existing classes and realize they have some information in common

Adding a superclass and pulling out the common information from the 2 subclasses into the superclass (the same as specialization only in the opposite direction)

Inheritance

•Generalization and Specialization are examples of inheritance

•SubClassA and SubClassB are both specialized types of SuperClass

•SubClassA and SubCLassB will have all the attributes of SuperClass in addition to their own attributes

Database Design: Developing a Data Model

An important first step for developing a data model is taking the time to learn about the data and how the data relates.

Remember not all data is useful. While you may wish to include data that does not directly relate to the problem, this can quickly become problematic as your data model can grow into an unmanageable mess.

Keep in mind, a data model should work for:

  • A set of assumptions
  • Function within the limitations of the problem scope

Lets start with an example of a soccer (football for all my non-US friends) club data model. Here is an example of a class you could build for holding team data:

The attributes are used to capture data about each team.

But let’s look at a new problem we haven’t really discussed yet:

Look at the AgeGroup above. If I asked you what teams are in the U-12 age group, as a human you could look at the table above and tell me that Rage and Hurricanes are. However, if you tried running a query for U-12, it would only return Rage, as U – 12 and U-12 are viewed as completely different terms by a computer.

To prevent this, one approach could be to create a new class

Now, U-12 will only appear once in the Agegroup table, this removes the risk of someone typing it in differently like in the table before. Integrity of data accuracy is something to consider when deciding how many tables to create.

Now lets look at the issue of team captains, considering a team captain is also a player. The diagram below shows there are 2 relationships between Player and Team classes. This is perfectly okay.

•Do you want to select objects based on the value of an attribute?  Then you may want to introduce a class for that information. 

  (Ex: you want to see all teams that are in age group “U-12”)

•Do you need to store other data about this information?  Then you may want to introduce a class for that information.  (Ex: We are storing the team captain’s name but also want his/her email and phone number)

•Are you already storing similar information?  Then you may want to use a relationship among existing classes.  (Ex: the information about team captain is the same as the information about the players, so use that class with a new relationship)

Multiple Companies in One Building

Now consider the example of a building housing multiple companies. While the diagram below is not completely incorrect, I will argue against the relationship between Employee and Room. In this example, it appears that you can infer the Employee location through the Company-Room relationship. While having multiple route for data isn’t wrong, make sure they convey different information.

Fan Trap:

•Each employee belongs to one division

•Divisions are made up of many Groups

•The problem here is if you try to infer something that was not intended

•You know it’s a fan trap when you have 2 relationships with many cardinality on the outside ends

They way the data model is written, a division can have many different employees and a division can also belong to many different groups. So trying to determine what group an employee belongs to via their division is impossible in this data model.

Chasm Trap

•Each employee can belong to at most 1 group

•Each group belongs to 1 and only 1 division

•Divisions are made up of many Groups

•Can you answer the question, “What division does each employee belong to?”

•You know it’s a chasm trap when the connection is not always there or there is a gap in a route between classes

•Ann doesn’t belong to a group since the optionality is 0

•We can only determine division based on group assigned

•So we have no idea what division Ann is in

Multiple Routes Between Classes:

•Whenever there is a closed loop, check to see if the same information is being stored more than once (don’t be redundant).

•Make sure you are not inferring more than you should from a route.  Always look out for the case when a class is related to two other classes with a cardinality of many at both outer ends.

•Ensure that a path is available for all objects.  Are there optional relationships along the route?

You an even have a Self Relationship

•Say that a club requires an existing member to sponsor any new members

•You wouldn’t have a class for member and a class for sponsor because they have the same data

•You can represent this type of situation with a self relationship because objects of a class can be related to each other

MS Access: Use comparison operators and dates in queries

This tutorial was created as supplemental material for my undergrad course in database design. You can find the full course here: Course

For this example, I want to create a new table. I have attached an Excel file below that you can download.

From access: External Data> New Data Source > From File > Excel

Check First Row Contains Column Headings and click Next

You can change the data types of the column, but I am just leaving them as is.. click next

Let Access add primary key > click next

Name your table and hit finish

Now if you click on the Employee table in the table list on the left you will see the results

Comparison Operators

Comparison operators are the symbols that let us check if something is equal to, greater than, less than, etc

Lets create a query using comparison operators

Click on Create > Query Design

Drag the Employee table into the query workspace

Add all the fields below and in the Criteria spot for Age, put >40

Right click the Query Tab and click Datasheet View

You can now see the results with employees only over the age of 40

Play around with it, try less than 40, >= or <=, just try some different queries

Also remember, you can right click on Query1 tab and select SQL View to see the SQL code that runs the query

You can also use Between to select Criteria- Below will return everyone aged between 30 and 45

Dates

Now lets try querying dates

When working with dates, you need to put #’s before and after the date. If your Access is set to USA settings, we go MM/DD/YYYY, European (and most of the rest of the world) goes DD/MM/YYYY

The below query will return employees hired after Jan 1 2010

And here are the results

MS Access: Import Excel File, Sort and Filter Data

In this lesson, we will be importing an Excel file into MS Access and learning to use the sort and filter functionality. If you want to follow along, download the Excel file below:

I am using the same database I build in the previous lesson. You can start a new one if you want, but since most of these early lessons will be sandboxing( a programming term for playing around with a software platform) in Access it doesn’t really matter if you just use the same database for everything.

From the Home screen in Access, click on the External Data tab and open up the External Data ribbon

Select New Data Source > From File > Excel

Browse for your excel file, select Import the source data into a new table and click ok (note, if you want to use a different Excel file, you can)

My excel file has column names in the first row, so I make sure that is check and I click next

Up top, you can change column names, and data types, but in this case, Access did a good job of assigning datatypes for me. Next>

I check Let Access add primary key, you could just select your own from the drop down though. Next >

Name your table whatever you want and hit Finish

If you spent a lot of time fixing data types and column names or if you are going to upload a file like this on a regular basis, you can save import steps, but I am going to skip this for now, just hit close

Click on your new table to open it up

Sorting

Filters and sorting work like they do in Excel, here I select my column and Sort Largest to Smallest

Filter

Here is select Model and filter down to a single model (Accuvix A30)

Back to Main Course Page: Course

Database Development and Design

Week 1

  1. Databases: What are they, and why do we need them?

Week 2

  1. Database Design: First Things to Consider
  2. Database Design: The Development Process
  3. Intro to MS Access
  4. MS Access: Import Excel, sort and filter table
  5. Database Design: Lab 1 Walkthrough

Week 3

  1. Database Design: Initial Requirements and Use Cases
  2. MS Access: Intro to Queries
  3. MS Access: Calculated Fields
  4. MS Access: Query two related tables
  5. MS Access: Use Comparison Operators and Dates in Queries
  6. Database Design: Lab 2 Walkthrough

Week 4

  1. Database Design: The Model
  2. Database Design: Lab 3 Walkthrough

Week 5

  1. Database Design: Developing a Data Model
  2. Database Design: Lab 4 Walkthrough

Week 6

  1. Database Design: Generalization and Specialization

Python Project 2.2: DataFrames

For this second project in module 2, I want you to code this code block below and run it in Python

x = {'Name': ['Bill','Carol','Jared','Rozmeen','Jeff','Juan','Lori'],
     'Age': [25,33,42,32,24,48,33],
     'JobTitle': ['Tech','Analyst','Manager', 'Consultant', 'DBA','Engineer', 'Analyst' ],
     'YearsService': [2,8,12,3,1,23,6]
      }

Now, turn the dictionary above into a DataFrame

Next sort DataFrame by YearsService in ascending order

Copy and paste this code below into Python

Loc = ['London', 'NYC', 'Los Angeles', 'New Orleans', 'Paris','Miami', 'NYC']

Create a new column in the DataFrame called Location, using the list provided above

Next filter the DataFrame to only show the Analysts in the list

Now finally filter the DataFrame down to the 3 columns below.

Python Project 2.1 – Create a Module

In this project, I want you to create your own module. The module needs to have 3 methods (functions):

  • Method called Intro > Accepts 1 argument > Name > Prints “My name is <the inputted name>”
  • Method called Job > 2 args > Name, Job > Prints “Hi, I am <name>, and I am a <job>”
  • Method called Pets >2 args > Number, Pets > Prints “I have <number> <pets>”

Open a new python script, import your module and test the three methods.

Free Courses

Database Development and Design

Introduction to Database Development and Design using MS Access

Python for Data Science

  • Fundamentals (Available now)
  • Data Handling / Management (coming soon)
  • Statistics and Visualizations (coming soon)
  • Machine Learning (coming soon)

R for Data Science (coming soon)

Excel for Data Science (coming soon)

SQL for Data Science (coming soon)