Python: Accessing a MySQL Database

Here is a code block to create a database if you want to play along

create database sandbox;

use sandbox;
CREATE TABLE employee_id (
        emp_nm varchar(30) not null,
        emp_id varchar(8),
        b_emp_id varchar(8),
        PRIMARY KEY(emp_nm) );   

While loading data from Excel files and CVS files is pretty straightforward in Python, getting database from a SQL database is really a skill you should learn. SQL databases can store much more data than an Excel file and are used pretty much everywhere now.

This lesson will focus on MySQL, I have another lesson for SQL Server as well. The syntax does vary a little and if you are working with Oracle or Teradata, your syntax may change a bit as well. Luckily the Internet has plenty of resources to point you in the right direction.

Now, for MySQL, lets get started. First you are going to want to install mysql.connector (if you have the Anaconda distribution that I recommend, it comes with this pre-installed.

If you don’t have it installed, you can get it using pip or conda – I have a video showing you how to install a module here: video link — skip ahead to the 7 minute mark where I explain how to install modules

Once install, import it into your notebook

Now lets make a connection to our server: syntax should be self explanatory – localhost simply means its installed on my computer — otherwise you would provide a network path to the server there.

Let’s start by looking around- First I want a name off all the databases in my instance

Now in this example we will be working with the Sandbox database — I provided code at the top of the lesson you can paste and run in your MySQL instance to create a Sandbox database

Now lets add a new element to our connection string – database

And query the table names in the Sandbox database

Using the .execute() method, we can pass SQL commands to the MySQL instance

Below I am creating a table -> passing data to the table -> and committing the data

Without the commit() command, the data will not be saved into the table.

To add multiple rows to the table, I can use the executemany() method

Now let’s query our table. Note the .fetchall() method — this brings in all the rows from the query. I can iterate through list by running a simple for loop

I can also use the command .column_names after my cursor to return the column names of the last table I queried

Finally, we can use Pandas to put this database table into a dataframe we can work with

Python: Working with Rows, Pandas DataFrames

Here is a quick lesson on working with rows in DataFrames

Let’s start by building a DateFrame

Lets start by deleting (or dropping) a row:


Let’s try dropping a couple of rows:

[-1] allows us to drop the last row:

Side note:

Here is another method from pandas you should know. This is the shape method. See it returns (rows, columns)

We can filter rows out using conditional logic

The index method returns the row indexes

We can use the .index as well as conditional logic to add a new column to our dataframe

Using append(), we can add a dictionary as a new row to the DataFrame

Finally, using .iloc we can iterate through the values of a row

Last Lesson: Pandas Renaming Columns

Next Lesson: Python: Working with Rows in DataFrames

Return to: Python for Data Science Course

Follow this link for more Python content: Python

Python: Closures

This is a more advanced programming topic, and honestly not one I make much use of myself. However, I got a request to make a lesson on Closures and Decorators. I am going to make it into two lessons to try to make it a bit clearer.

Now to understand closures and decorators, the first thing we need to discuss is nested functions. In programming, nested functions means wrapping a function inside of another function. Here is a very simple example of a nested function:

def outerFunction(x):
    innerFunction():
        print(x)
    innerFunction()

Now if I just call this with outerFunction(‘Hello World’), the argument x gets passed into the innerFunction which is in turn called by the outerFunction. The result is shown below:

Now why would anyone do this? Outside of just being able to show you that you can? I don’t know. There are reasons for nested functions that involve hiding variables from the code outside of the functions. That is a bit beyond the scope of this lesson. I will try to tackle it when I create my Object Oriented Programming lessons.

For now, let us get on to Closures. Let’s look at the code below:

It is very close to the first example, except notice 2 little changes. First, instead of just calling the innerFunction at the end of the function, we are returning it. Notice there is no () when we return innerFunction

Also notice we are passing the outerFunction to a variable and when we call the var we add () to the end: var()

What we have done now is created an official Closure, let me show you why that matters below.

First, notice I delete the outerFunction. When I try calling it returns an error saying outerFunction is not defined (meaning it doesn’t exist).

But, look what happens when I called var(). It is still there. The closure commits the value to memory when you create it. That is not something that would happen from a normal function or even nesting function.

SO WHY THE HELL SHOULD I CARE????

So why this is cool, and why you might use this in the future, is imagine a function that does processer heavy multi-step calculations. If you just keep calling the function as usual, each time you call it, it has to run the whole chain of calculations again. However, once you’ve called it as closure, you don’t have to run the calculations again, you just get the value.

So, how about a more concrete example

Here I created a closure that uses two functions which each take 1 argument, the arguments are then multiplied together.

Notice, I created a variable called double where I sent the outer function mult_by() a value of 2.

When I call double() I can pass it a value for (y)

I can create as many of these instances as I want. Look below, created one for Triple, and notice double is still functional

I’ll continue this in the Decorators lesson.

Just the main thing to keep in mind about closures, is that they improve program performance by no requiring the function to be run each time.

If you are still a little confused after reading this, don’t feel bad. It took me a few tries to finally understand closures myself.

Data Jobs: What does a Data Analyst Do?

Data Analysts get a bad wrap. With the advent of the Data Scientist, Data Analysts are often viewed as Data Scientists lite, however I feel that is not the honest case. Truth is, there is a lot of overlap between the two fields. I will dive deeper into what a Data Scientist is in a future article, but just know my opinion is the definition of Data Scientist as a job is still a bit fuzzy and I think the job title may eventually be broken into a few different titles to better define the differences.

Data Analyst

So what does a Data Analyst do?

A lot actually. You could put 10 data analysts into a room and you would get ten different answers to this question. So the best I can do here is make sweeping generalities. As the old saying goes “Your results may vary”

In general, data analysts perform statistical analysis, create reporting, run ad-hoc queries from data warehouses, create data visualizations, create and maintain dashboards, perform data mining, and create machine learning models (yes, ML is not only for data scientists). Their assignments are business driven. A data analysts is either embedded with a business unit (financial planning, fraud, risk management, cyber security, etc.) or working in a centralized reporting/analytics team. They use their skills to provide reporting and analytics for the business.

Tools used by Data Analysts

  • SQL – MySql, SQL Server, Oracle, Teradata, Postgres – whether simply querying a data warehouse or creating and managing a local data mart, data analysts need to be advanced SQL programmers
  • Visualization tools – Tableau, Qlik, Power BI, Excel, analysts use these tools to create visualizations and dashboards
  • Python/R – Data analysts should be familiar with languages like Python or R to help manage data and perform statistical analysis or build machine learning models
  • Spreadsheets – Excel, Google Sheets, Smart Sheets are used to create reports, and pivot tables used to analyze the data
  • ETL tools – SSIS, Alteryx, Talend, Knime, these tools are design to move data to and from databases, CSV files, and spreadsheets. Until the data is in a usable format, analysis cannot be performed.

Educational Requirements

Typically a data analyst position will ask for a bachelors degrees, preferably in computer science, statistics, database management or even business. While the barrier to entry for a data analyst job is generally not as high as a data scientist, that does not mean you cannot make a meaningful and well paid career as a data analyst. Also, the demand for data professionals seems to keep going up and up and it most likely will for the foreseeable future.

Python Project 1: Lists and Dictionaries

For your first project in the course, I am giving you the code below. You will notice I am placing 2 dictionaries in a list:

d = {'Name': 'Ben', 'Age': 35}
e = {'Name': 'Christine', 'Age': 30}
a = []
a.append(d)
a.append(e)
print(a)

Your challenge, should you choose to accept it, will be to run this code on your machine or in the test python browser (click on the blue arrow below) 

Try your Python code in the free console

I then want you to

1) Print out the second dictionary from the list

2) Print out the name from the first dictionary

3) Print out both ages

Note: I have not shown how to work with dictionaries inside a list. So consider this a stretch project. Don’t be afraid to use Google to help find an answer.


Project Answer Notebook Download Available at Course Page below:

Back to Python Course: Course

Solution Video

Python Web Scraping / Automation: Connecting to Firefox with Selenium

Selenium is a Python package that allows you to control web browsers through Python. In this tutorial (and the following tutorials), we will be connecting to Googles Chrome browser, Selenium does work with other browsers as well.

First you will need to download Selenium, you can use the following commands depending on your Python distribution

c:\> Pip install selenium

c:\> Conda install selenium

If you are on a work computer or dealing with a restrictive VPN, the offline install option may help you: Selenium_Install_Offline

Next you need to download the driver that let’s you manage Firefox through Python.

Start by determining what version of Firefox you have on your computer

Click the three horizontal lines in the upper right corner > Help >About Firefox

Search for geckodriver to download the file that matches your Firefox version. (note, this is something you will need to do every time Firefox is updated, so get used to it.)

Open up the zipfile you downloaded, you will find a file called geckodriver.exe

Put it somewhere you can find, put in the following code to let Python know where to find it.

from selenium import webdriver
opts = webdriver.FirefoxOptions()
dr = webdriver.Firefox('C:/Users/larsobe/Desktop/geckodriver.exe',chrome_options=opts)

Now to see if this works, use the following line, (you can try another website if you choose)   

Note the message Firefoxis being controlled by automated test software.

You are now running a web browser via Python.

SQL: Load (Insert) data into a table

Here are the steps to add data to existing table:

Let’s use the employee table created on the create table page: Create_table

To insert some data into this table, first you need to know what the data types of each of the columns in the table are. In this example I am using my MySQL. An easy way to see column data types is to Right Click on Table name> Send to SQL Editor > Create Statement

The results will appear in the query window. In this table we can see the columns are a integer (INT), string (varchar), DOB (date)

The syntax for inserting data into the table is as follows

insert into <table name> (column1, column2,...)
values (value1, value2,...)

In the example below, we are loading 3 rows in, separating each row by a comma:

If we run this, we can then check on the results with a Select statement

SQL: Common Data Types in MySQL

Here is a table of the most commonly used data types in MySQL

Data TypeDescription
Char()Fixed length string, unused spaces get padded and eat up memory: size 0-255
Varchar()Variable length string, unused spaces don’t use memory: size 0 to 65535
MediumText()A string up to 16,777,215 characters long
LongText()A string up to 4,294,967,295 characters long
INTinteger (whole number, no decimals)
Double(x, d)floating point decimal number, x is size, d is number of places after the decimal
Bool or BooleanBinary choice, 0 = False and 1 = True
DateDate data type “YYYY-MM-DD” (if set to US settings)
DATETIMEdatetime data type “YYYY-MM-DD HH:MM:SS” (if set to US settings)
TIMETime “HH:MM:SS”
YEARyear in for digit representation (ex 1908,1965,2011)

R: Installing Packages with Dependencies

Usually installing packages in R is as simple as

install.packages("package name")

However sometimes you will run into errors. This could be due to the fact that the package you are trying to install has what is known as a dependency. What this means is that in order for the package to properly install and run, it requires another package to  already be installed.

You can think of this like trying to install an add-on for Excel like PowerQuery without having Excel installed in the first place. Clearly it would not work.

Now if you are lucky enough to know exactly what package(s) needs to be installed first, then you can simply install it and be on your way. However, most of the time we are not that lucky. And in the case of some packages, you may need to install up to a dozen packages up front to get it to work.

The easier way, just add the following syntax to your command

install.packages("package name", dependencies = TRUE)

Remember in R, Boolean (TRUE and FALSE) must be all capital letters or R will not recognize them as Boolean.

If you are using RStudio, you can install the package using the GUI

At the top, got to Tools and select Install Packages from the drop down.

2018-07-12_20-23-11

Start typing the package you want in the box, it will pop up in the window

2018-07-12_20-23-40

Finally, make sure install dependencies and checked and click install.

2018-07-12_20-23-54

R: Importing Excel Files

To work with Excel files in R, you can use the readxl library

install.packages(“readxl”)

Use the read_excel() function to read and Excel workbook

> library(readxl)
>
> ## read excel file
> df1 <- read_excel("r_excel.xlsx")
> head(df1)
# A tibble: 6 x 4
  Name   Job              Hours Complete
  <chr>  <chr>            <dbl> <chr>  
1 Sally  Predictive Model    1. n      
2 Philip Maintanence        10. n      
3 Tom    Ad-hoc Report      12. y      
4 Bob    SSRS                3. y      
5 Philip Tableau             7. n      
6 Tom    Tableau             9. n

By default, read_excel() reads only the first sheet in the Excel file. To read other sheets using the sheet key word.

> ## read  sheet 2
> df2 <- read_excel("r_excel.xlsx", sheet =2)
> head(df2)

# A tibble: 6 x 2
  Animal Num_Legs
  <chr>     <dbl>
1 Dog          4.
2 Duck         2.
3 Snake        0.
4 Horse        4.
5 Spider       8.
6 Human        2.

The next example, we are reading range B2 – C6 on sheet 1 (same as Excel’s range function)

> ## read sheet 1, range B2 - C6
> df3 <- read_excel("r_excel.xlsx", sheet =1, range = "B2:C6")
> df3
# A tibble: 4 x 2
  `Predictive Model`   `1`
  <chr>              <dbl>
1 Maintanence          10.
2 Ad-hoc Report        12.
3 SSRS                  3.
4 Tableau               7.

In this last example, we are importing only the first 4 rows on sheet 2

> ## read sheet 2, first 4 rows only
> df4 <- read_excel("r_excel.xlsx", sheet = 2, n_max = 4)
> df4
# A tibble: 4 x 2
  Animal Num_Legs
  <chr>     <dbl>
1 Dog          4.
2 Duck         2.
3 Snake        0.
4 Horse        4.