Python: Object Oriented Programming

In the argument between R and Python, the fact the Python is a full blown Object Oriented Programming(OOP) language gives it a solid advantage to me. Why? OOP gives you the ability to create and use objects in your programs.

Now I remember my first OOP class was in Java. It was a college course and started with reading 4 dense chapters on what made a language OOP. It talked about Objects and methods and data encapsulation. In the end, I just had a headache, and I really didn’t understand anything until I built my first Object.

So, let’s just jump right in and build our first object.


The first step in building an object is to build a class. Now I am going to move quick here, and then work my way back to better explain what we are doing.

So, I am building a 2 function calculator here.

  • class myCalc:  – First I create my class and name it
  • def __init__(self):  — this is called a constructor – you need to have one, but you don’t need to use it. Notice I have the word pass on the next line. That means just move on without doing anything. Don’t worry constructors for now. We will cover constructors in a later lesson.
  • def myAdd(self, x,y): this is basically a function inside my class, but inside a class we call functions methods. The first argument (self) is a reference to our constructor. Again, don’t worry about it yet, just know it has to be there or you will get an error
  • def mySub(self,x,y): Same as above


Create our Object

By assigning our class to a variable name, what we doing is creating an Instance


Now that I created and Instance can call my Methods

Notice when I call myAdd and mySub, I only provide 2 arguments. Like I said earlier, we aren’t using the self argument in this case, so we don’t need to pass any argument to it.


Okay, so what is the big deal you are asking? I mean, all I did was show you a more complicated way to make a function.

Why Bother

So I am going to attempt to show you a practical use for a class without resorting to totally impractical examples or overly complicated ones.

myExp works by asking you for a number when you create an Instance. Then when you call the exp1() function, it raises what ever value you give exp1() by the number you started with.


  • ms = myExp(2) – self.x is set to 2 in the __init__ method
  • ms.exp1(3) – 3  ** 2
  • ms.exp1(4) – 4 **2
  • ms1 = myExp(3) – self.x is set to 3
  • ms1.exp1(3) – 3**3
  • ms1.exp1(4) – 4**3
  • ms.exp1(3) – 3** 2 — this is the cool part. –self.x for my ms object is still 2 and self.x for ms1 is 3


Here it is in running code:


I can create at many Instances as I want, giving each instance whatever value of self.x I want. And I can reuse them in any fashion I can imagine.

Here I pass one Instance as an argument to another.


It is the re usability and the ability to assign different values to objects that makes them so useful.

This is just an intro

Don’t worry if you are little confused. This is just an intro and I will create more OOP lessons to hopefully help clear everything up.



Python: Create a Box whisker plot

Box whisker plots are used in stats to graphically view the spread of a data set, as well as to compare data sets.

If you would like to follow along with this example, he is the data set: sensors

Using pandas, let’s load the data set

%matplotlib inline
import pandas as pd
import matplotlib as mp
import matplotlib.pyplot as plt

sensorDF = pd.read_excel("C:\Users\Benjamin\Documents\sensors.xlsx")

Our data set represents monthly readings taken from 4 sensors over the span of a year


We need to convert the dataframe to a list values for our box plot function.

To do this, first we need to flatten() our dataframe. The flatten() method places all the values from the dataframe into 1 list


Now let us chop the list into the for sensors represented by the rows in our dataframe


Finally, we need to make a list of these lists


I know that seemed like a lot, but you will spend more time cleaning and prepping data than any other task. It is just the nature of the job.

Let’s Plot

The code for creating a boxplot is now easy.


Let’s label our chart a little better now.







Python: Hypothesis Testing(T Test)

Hypothesis testing is a first step into really understanding how to use statistics.

The purpose of the test is to tell if there is any significant difference between two data sets.

Consider the follow example:

Let’s say I am trying to decide between two computers. I want to use the computer to run advanced analytics, so the only thing I am concerned with is speed.

I pick a sorting algorithm and a large data set and run it on both computers 10 times, timing each run in seconds.

Now I put the results into two lists. A and B

a = [10,12,9,11,11,12,9,11,9,9]
b = [13,11,9,12,12,11,12,12,10,11]

A quick look at the data makes me think b is slower than a. But is it slower enough to mean something or are these results just a matter of chance (meaning if I ran the test 200 more times would the end result be closer to equal or further apart).

Hypothesis test

To find out, let’s do a hypothesis test.

Set our Hypothesis:

  • H0 = H1 – there is no significant difference between data sets
  • H0 <> H1 – there is a significant difference

To test our hypothesis, let’s run a t-test

import stats from scipy and run stats.ttest_ind().

Our output is the z-statistic and the p-value.

Our p-value is 0.08 – greater than the common significance value of 0.05. Since it is greater, we cannot reject H0=H1. This means both computers are effectively the same speed.


Let’s try a third computer – d

d = [13,12,9,12,12,13,12,13,10,11]

Now, let’s run a second T-test.  This one comes back with a p-value of 0.026 – under 0.05. This means we can reject our hypothesis that a=d. The speed differences between a and d are significant.


Python: Printing with .format()

Another way to print variable values in Python is the .format(). This method replaces %d,%s, and %r, but I still showed them to you in an earlier lesson as you will still see them in use, I wanted you to know about them.

When using .format() – {} represents your place holder in the string.


You can also use numbers to tell format which order to use values in.


You can name your place holders:


You can use user input


And you can even use dictionaries


Python: Logistic Regression

This lesson will focus more on performing a Logistic Regression in Python. If you are unfamiliar with Logistic Regression, check out my earlier lesson: Logistic Regression with Gretl

If you would like to follow along, please download the exercise file here: logi2

Import the Data

You should be good at this by now, use Pandas .read_excel().

df.head() gives us a the first 5 rows.

What we have here is a list of students applying to a school. They have a Score that runs from 0 -1600,  ExtraCir (extracurricular activity) 0 = no 1 = yes, and finally Accepted 0 = no 1 = yes


Create Boolean Result

We are going to create a True/False column for our dataframe.

What I did was:

  • df[‘Accept’]   — create a new column named Accept
  • df[‘Accepted’]==1  — if my Accepted column is 1 then True, else False


What are we modeling?

The goal of our model is going to be to predict and output – whether or not someone gets Accepted based on some input – Score, ExtraCir.

So we feed our model 2 input (independent)  variables and 1 result (dependent) variable. The model then gives us coefficients. We place these coefficients(c,c1,c2) in the following formula.

y = c + c1*Score + c2*ExtraCir

Note the first c in our equation is by itself. If you think back to the basic linear equation (y= mx +b), the first c is b or the y intercept. The Python package we are going to be using to find our coefficients requires us to have a place holder for our y intercept. So, let’s do that real quick.



Let’s build our model

Let’s import statsmodels.api

From statsmodels we will use the Logit function. First giving it the dependent variable (result) and then our independent variables.

After we perform the Logit, we will perform a fit()


The summary() function gives us a nice chart of our results


If you are a stats person, you can appreciate this. But for what we need, let us focus on our coef.


remember our formula from above: y = c + c1*Score + c2*ExtraCir

Let’s build a function that solves for it.

Now let us see how a student with a Score of 1125 and a ExCir of 1 would fair.


okayyyyyy. So does 3.7089 mean they got in?????

Let’s take a quick second to think about the term logistic. What does it bring to mind?


Okay, but our results equation was linear — y = c+ c1*Score + c2*ExCir

So what do we do.

So we need to remember y is a function of probability.


So to convert y our into a probability, we use the following equation


So let’s import numpy so we can make use of e (exp() in Python)


Run our results through the equation. We get .97. So we are predicting a 97% chance of acceptance.


Now notice what happens if I drop the test score down to 75. We end up with only a 45% chance of acceptance.


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python







Python: Linear Regression

Regression is still one of the most widely used predictive methods. If you are unfamiliar with Linear Regression, check out my: Linear Regression using Excel lesson. It will explain the more of the math behind what we are doing here. This lesson is focused more on how to code it in Python.

We will be working with the following data set: Linear Regression Example File 1

Import the data

Using pandas .read_excel()


What we have is a data set representing years worked at a company and salary.

Let’s plot it

Before we go any further, let’s plot the data.

Looking at the plot, it looks like there is a possible correlation.


Linear Regression using scipy

scipy library contains some easy to use maths and science tools. In this case, we are importing stats from scipy

the method stats.linregress() produces the following outputs: slope, y-intercept, r-value, p-value, and standard error.


I set slope to m and y-intercept to b: so we match the linear formula y = mx+b

Using the results of our regression, we  can create an easy function to predict a salary. In the example below, I want to predict the salary of a person who has been working there 10 years.


Our p value is nice and low. This means our variables do have an effect on each other


Our standard error is 250, but this can be misleading based on the size of the values in your regression. A better measurement is r squared. We find that by squaring our r output


R squared runs from 0 (bad) to 1 (good). Our R squared is .44. So our regression is not that great. I prefer to keep a r squared value at least .6 or above.

Plot the Regression Line

We can use our pred() function to find the y-coords needed to plot our regression line.

Passing pred() a x value of 0 I get our bottom value. I pass pred() a x value of 35 to get our top value.

I then redo my scatter plot just like above. Then I plot my line using plt.plot().


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python



Python: Accessing a SQL database

If you really want to do data work, you need to be able to connect to a database. In this example I will show you how to connect to and query data from MS SQL Server with the AdventureWorks2012 database installed.

This lesson assumes some very basic knowledge of SQL. If SQL is a complete mystery, head over to my SQL page: SQL  If you check out the first 4 intro lessons, you will know everything about SQL you need to know for this lesson.

Install pyodbc

To connect to the database, we need to install pyodbc. Go to your Anaconda terminal and type: pip install pyodbc


Now open up your jupyter notebook and start a new notebook

Connect to Database

import pyodbc

cnxn is our variable – it is commonly used as a shorten version of connection

syntax: pyodbc.connect(‘DRIVER={SQL Server}; SERVER=server name; DATABASE=database name;UID = user name; PWD = password’)

finally cursor =cnxn.cursor() creates a cursor for us. In SQL, a cursor is used to step through your results one row at a time.


cursor.execute(place sql query here)  – this is how you pass a sql query – note query goes in quotes

tables = cursor.fetchall() – fetch all the rows in your query results


We can now iterate through the rows in tables.


I don’t like the layout of this. Also we can’t really work with the data.

Pandas Dataframe

first import pandas

  • d= [] – create empty dictionary
  • d.dappend({‘Name’:row.Name, ‘Class’: row.GroupName}) – fill dictionary 1 row at a time
  • df = pd.DataFrame(d) – convert your dictionary to a dataframe


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python

Python: Read CSV and Excel with Pandas

Earlier is showed you how to use the Python CSV library to read and write to CSV files. While CSV does work, and I still use elements of it occasionally, you will find working with Pandas to be so much easier.

Practice Files Excel: Linear Regression Example File 1

CSV: heightWeight_w_headers

Let’s start with our CSV file. Unzip it and place it in a folder where you will be able to find it.

If you want to make your life a little easier, when you open your Jupyter Notebook, type pwd

This will give you your present working directory. If you place your csv file in this directory, all you have to do is call it by name.


I am not going to do that however. I want to show you how to work with directories.

Get Full Directory Path

For Windows Users, if you hold down the Shift key while right clicking on your file, you will see an option that says: Copy as Path


Read the CSV file

First import pandas as pd

Then assign a variable = pd.read_csv(file name) – paste the full path of your CSV file here.

variable.head() = the first 5 rows from your data frame.


Write CSV file

Okay, let’s write a CSV file. First, let’s add some rows to current dataframe.

We will do this be first creating a new dataframe with 3 rows of data.


Now we will use the .append() function to append this new dataframe to end of our existing dataframe.

The ignore_index = True argument tells the new rows to continue using the index of the main dataframe. Otherwise, the new rows would start counting at 0 again. If you want to try it, just leave out the argument: HgtWgt_df.append(df2)


Now, we just use the df.to_csv() command to write our appended dataframe to a new CSV file


And there is our new file


Read Excel File

I’d love to be able to wow you with how complicated reading an Excel file is, but the difference between the Excel file reading and CSV is one word – excel.


The only caveat is if your Excel file has multiple sheets. By default pd.read_excel() goes to sheet 1. If you want it to read sheet 4 instead, you would add: pd.read_excel(filename, sheetname= 4)

Write to Excel File


Bonus note

If you haven’t already, please check out my earlier CSV lesson: Python: Working with CSV Files

This will really give you an appreciation for how powerful and time saving pandas really is.


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python