Project 3 – Cash Register

For your first project in the course, I am giving you the code below. The code includes 3 lists of varying size. These lists represent the line items on a bill.

bill = [18, 9, 6, 7.5]
bill2 = [102,44.4]
bill3 = [66,1,90,5,2,5,77,84,2,2,4,2]

Your challenge, should you choose to accept it, will be to run this code on your machine or in the test python browser (click on the blue arrow below) 

Try your Python code in the free console

I then want you to create a cash register, write a python script that

1) Puts the 3 lists into 1 list

2) Create a function that adds up the items in the lists

3) Prints out the Bill total of each of the 3 bills


Project Answer Notebook Download Available at Course Page below:

Back to Python Course: Course

Solution Video:

Python Project 2 – Lists and conditional logic

For your sections project in the course, I am giving you the code below. You should recognize this as a list.

x = [102, 81, 79, 44, 27]

Your challenge, should you choose to accept it, will be to run this code on your machine or in the test python browser (click on the blue arrow below) 

Try your Python code in the free console

I then want you to

  1. Using a loop and conditional logic, print out each number from the list
  2. Print out whether the number is divisible by 2, 3, or not divisible by either 2 or 3.

Project Answer Notebook Download Available at Course Page below:

Back to Python Course: Course

Solution Video

Python Testing Shell

You can use the shell below to try Python code you learn in the lessons

To run Python scripts, type them out in the code editor and press the run button to execute:

To run console commands (run each line from a command prompt), click the down arrow and choose the down arrow and then >_ Console

Return to main python page: Python

Data Jobs: What does a Data Architect do?

If experience has taught me anything, it is that while companies and organizations have gotten much better at collecting data, most of this data is still just stored away in unstructured data stores. So while the data is “technically” there, it is not a whole lot of use to anyone trying to build a report or create a machine learning model. In order to get actually use out of all the data being stored, it needs to organized into some sort usable structure: Enter the Data Architect.

Data Architect is a job title I have held on multiple occasions throughout my career. My best description when explaining the job to other people is that I was kind of like a data janitor. You see, everyone takes their data and dumps it into the storage closet. So you find some billing data on the second shelf, tucked away behind employee HR records. The floor is cluttered with server logs and six years worth of expense reports stored as PDFs with completely useless names like er123.pdf.

As a data architect, it was my job to try to organize this mess and put the data into some sort of structure that lends itself to reporting or modeling. So data architects have to be well versed in data modeling, data storage, and data governance.

Data Modeling

ERD diagram showing an HR database design

Data modeling is basically taking raw data dumps and organizing them into structure that fit the needs of company. It could involve creating an HR database like above or creating a series of aggregated tables designed for reporting or dashboarding. It is the job of the data architect to best fit business needs to a data platform: be it a transactional database, a data warehouse, or perhaps a data lake.

Data Storage

Data architects also need to address data storage. While I often defer to the server crew in the IT department, as a data architect I do advise on to how and where to store data. Cloud infrastructure and cheaper faster disk storage has made a lot of data storage decisions easier, but it is good to have a working understanding of storage platforms.

Data Governance

Data governance is all about how the data is managed from a regulatory and security standpoint. It is the practice of deciding who can have access to what data. Can some data be “outside facing” versus should some data sit behind multiple firewalls in a DMZ zone.

You will often work hand in hand with Legal and Security departments when figuring out how data governance practices will be implemented.

SSRS: Drill down

Let’s make the report at little more readable and interactive.

We are going to do a drill down on the names in our report. This mean when the report first opens, you will only see one row for each name. If you want to see the details of that person, you can click on a + by their name and the details for that person will pop down like an accordion.

Below your report design window, you should see another window called Row Groups

Go to the area marked Details (green arrow) and right click – Group Properties…

1

Click Visibility > Hide > Display can be toggled.

(Note I am selecting Name1 not Name. If you remember Name was the column we hid. Name1 is the column created when we created the Parent Group)

2.png

Now when you open your report, you will see it is in drill down format.

3.png

Blockchain: Immutable Ledger

The next concept we need to cover to properly understand blockchain is the concept of an Immutable Ledger.  To translate this term into something that looks more like English, an Immutable Ledger simply means a record that cannot be changed.

The idea behind all of this is data security and proof that the data has not been altered. Why are we so concerned here? In a blockchain application like BitCoin, we are tracking transactions of money. Imagine if you sent me an electronic funds transfer for $100. How would you feel if I hacked into your bank and changed that $100, to $100,000? (For the record, if you tried that with my account, the computer would just laugh at you. The only time you’ll see a number that big associated with me is when you are looking at my student loans LOL).

Anyway, back to the point, you want to make sure that when you set up a transfer for $100, no one can alter it. With blockchain, that is true even if you want it to be altered (you made a mistake). If you want to fix an error, you will have to add another transaction to the blockchain to correct the issue. This is actually good accounting practice, as once an entry is made into a ledger, it should never be removed or altered.

Think of this like purchasing a car. If you go to your neighbor and buy his used car for $2000. You give him the money, and he signs over a title to you. The title is proof of ownership. To ensure that your neighbor cannot just take the car back, you take the title down to the Department of Motor Vehicles and have the title registered in your name. Now you have a record of your transaction should the ownership of the vehicle ever come into question.

So how does blockchain ensure immutability of the ledger? It all resides in the concept of the hash. If hacker tries to alter anything in the block below, its hash will change. Now the hash will no long match the previous hash in the second block. So, the hacker would have to change the next block, and the block after that, etc.

2018-04-04_13-21-53

And even if they were able to pull that off, remember that the blockchain resides on multiple computers. The hacker would need to make all of these changes simultaneously. When you consider the millions of nodes that make up a blockchain environment like BitCoin, you will see that would be impossible.

2018-04-04_14-50-00

In the next lesson, we will be looking at peer to peer distributed networks

ETL : Extract, Transform, Load

It you ever actually intend to work with data in the real world, you will quickly come face to face with two truths:

  1. Data is messy
  2. The data you need is never in one place

While creating predictive models and showing off your brilliant math skills is infinitely more sexy, before you even begin to build a simple linear regression until you have all your data together and in the correct format.

If you haven’t heard it before, it is a pretty common estimate that a data scientist spends 60 to 70 percent of their time working as a data “janitor”. While on a smaller scale, most of this data manipulation and merging (often referred to as munging) can be done manually, in a professional environment, you are going to want to find a way to try to automate as much of the process as possible.

ETL Tools

This is where ETL Tools come into play. Some of the more popular ones on the market are:

  • Oracle Warehouse Builder (OWB)
  • SAS Data Management
  • PowerCenter Informatica
  • SQL Server Integration Services (SSIS)

SSIS is the one I am most familiar with (also the one I use professionally) so it will be the one I will be focusing on in future lessons.

What these ETL Tools do is help you automate the data munging process.

Extract:

When discussing ETL, extract means pulling data from a source (or many sources). Consider this, you have Customer table residing on a SQL Server Database, a Sales Person table sitting on a Teradata Database, and a list of purchases kept on an Excel spreadsheet.

Using an SSIS tool, you can connect to these three different data sources and query the data (Extract) into memory for you to work with.

Transform:

Transform in ETL refers to manipulating the data.

Pulling from the example above, in the SQL Server table you have

Cust_ID         Cust_NM
9081            Bob Smith

The Excel sheet says

Cust_ID         Cust_NM
CN9081          Bob Smith

In the Excel spreadsheet, someone has put a CN in front of the Cust_ID. This is a common problem you will run across when working with real world data.  Using an ETL tool, you can set up steps that will either remove the CN (or add it- your prerogative).

Load:

Finally, once you have extracted the data from the different sources, transformed it – so differences in the data are correct, you come to the final step: loading it into a location of your choice – usually a data warehouse table.

Splunk: Introduction to Real Time Data Analysis – Setting Alerts

Splunk really shows its power in the realm of real time analysis of unstructed data. A professional implementation of Splunk involves some sort of machine produced data stream being fed into Splunk. This could be web clicks, social media feeds, sensor readings from mechanical devices, log files, etc.

In our example we are going to be working with computer log files I pulled from a server.

Download Data File:  event

Log into Splunk and select Add Data from the home screen

2016-11-30_08-56-59

Select upload

2016-11-30_08-57-31.jpg

Select File

2016-11-30_09-00-27

Select the event.csv file I provided

2016-11-30_08-58-58.jpg

Select Next

2016-11-30_09-01-45.jpg

Select Next (since we are on a local machine, the default Input settings will do fine)

2016-11-30_09-02-03.jpg

Finally, hit Submit

2016-11-30_09-02-18

Now we have the full log file loaded, let’s try filtering the data load down a little.

Put error* in the search bar. (* is a wild card in Splunk)

Now only Errors are showing up in our data.

2016-11-30_09-03-51.jpg

Now try the following in the search bar

error* terminal*

Now my log files are filtered down to TerminalServices Errors only.

2016-11-30_09-06-12

Notice the top bar graph. It shows an abnormal increase in these particular errors in October.

2016-11-30_09-06-39

Setting Alerts

This seems like there has been an abnormal increase in these particular errors. Wouldn’t it be nice to know if these errors were starting to fire off again.

Splunk lets us set Alerts to do just that. Above the bar graph, you will find a drop down menu Save As – click on it and then select Alert

2016-11-30_09-08-13.jpg

Give the Alert a name.

I don’t want to run this on a schedule. Instead I clicked Real-time

I set the Trigger Conditions to go off when more than 50 of these errors appear.

2016-11-30_09-10-02.jpg

Under Add Trigger Actions, I select Add to Triggered Alerts

2016-11-30_09-10-44.jpg

Select your Severity

2016-11-30_09-11-08.jpg

Now the Alert is saved

2016-11-30_09-11-49

If you select Alerts in the top menu, you can see the newly saved alert too.

2016-11-30_09-12-20.jpg