Python: Lists and Dictionaries

Lists and Dictionaries are kind of like super variables. They allow you to store multiple data under a singular name. In the example below, I first assign the variable ‘a‘ a value of 6. Next when I go to assign it a value of 7, it replaces 6.

What if I want to keep both 6 and 7?

pylist

Sure I could use 2 variables, (ex. a and b), or I can use a list.

List

A list is a collection of data in Python. You create a list by assigning a variable a group of data enclosed in square brackets [].

Notice that lists are indexed, meaning you call them up using the following syntax: variable[index]

Note that indexes in Python start at 0. So in our set of 4 elements, the indexes would be 0,1,2,3. Notice when I try to use index 4, I get an out of range error.

pyfund

Lists can also hold strings, or combinations of values in one list.

pyfund1

Dictionaries

This of dictionaries as lists with named indexes. This makes looking up values easier (especially when your code get more complicated).

Each data element is paired with a named index. A dictionary is denoted through the use of curly braces {} and : separating the named index and the value.

pythonfundaments9


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python

Python: Central Limit Theorem

The Central Limit Theorem is one of core principles of probability and statistics. So much so, that a good portion of inferential statistical testing is built around it. What the Central Limit Theorem states is that, given a data set – let’s say of 100 elements (See below) if I were to take a random sampling of 10 data points from this sample and take the average (arithmetic mean) of this sample and plot the result on a histogram, given enough samples my histogram would approach what is known as a normal bell curve.

In plain English

  • Take a random sample from your data
  • Take the average of your sample
  • Plot your sample on a histogram
  • Repeat 1000 times
  • You will have what looks like a normal distribution bell curve when you are done.

hist4

For those who don’t know what a normal distribution bell  curve looks like, here is an example. I created it using numpy’s normal method

hist5.jpg

If you don’t believe me, or want to see a more graphical demonstration – here is a link to a simulation that helps a lot of people to grasp this concept: link

Okay, I have bell curve, who cares?

The normal distribution of (Gaussian Distribution – named after the mathematician Carl Gauss) is an amazing statistical tool. This is the powerhouse behind inferential statistics.

The Central Limit Theorem tells me (under certain circumstances), no matter what my population distribution looks like, if I take enough means of sample sets, my sample distribution will approach a normal bell curve.

Once I have a normal bell curve, I now know something very powerful.

Known as the 68,95,99 rule, I know that 68% of my sample is going to be within one standard deviation of the mean. 95% will be within 2 standard deviations and 99.7% within 3.

hist.jpg

So let’s apply this to something tangible. Let’s say I took random sampling of heights for adult men in the United States. I may get something like this (warning, this data is completely made up – do not even cite this graph as anything but bad art work)

hist6.jpg

But reading this graph, I can see that 68% of men are between 65 and 70 inches tall. While less than 0.15% of men are shorter than 55 inches or taller than 80 inches.

Now, there are plenty of resources online if you want to dig deeper into the math. However, if you just want to take my word for it and move forward, this is what you need to take away from this lesson:

p value

As we move into statistical testing like Linear Regression, you will see that we are focus on a p value. And generally, we want to keep that p value under 0.5. The purple box below shows a p value of 0.5 – with 0.25 on either side of the curve. A finding with a p value that low basically states that there is only a 0.5% chance that the results of whatever test you are running are a result of random chance. In other words, your results are 99% repeatable and your test demonstrates statistical significance.

hist7.jpg

Python: Histograms and Frequency Distribution

In the spirit total transparency, this is a lesson is a stepping stone towards explaining the Central Limit Theorem. While I promise not to bog this website down with too much math, a basic understanding of this very important principle of probability is an absolute need.

Frequency Distribution

To understand the Central Limit Theorem, first you need to be familiar with the concept of Frequency Distribution.

Let’s look at this Python code below. Here I am importing the module random from numpy. I then use the function random_integers from random. Here is the syntax:

random.random_integers(Max value, number of elements) 

So random.random_integers(10, size =10) would produce a list of 10 numbers between 1 and 10.

Below I selected 20 numbers between 1 and 5

hist1.jpg

Now, since I am talking about a Frequency Distribution, I’d bet you could infer that I am concerned with Frequency. And you would be right. Looking at the data above, this is what I have found.

I create a table of the integers 1 – 5 and I then count the number of time (frequency) each number appears in my list above.

hist2.jpg

Histogram

Using my Frequency table above, I can easily make a bar graph commonly known as a histogram. However, since this is a Python lesson as well as a Probability lesson, let’s use matplotlab to build this.

The syntax should be pretty self explanatory if you have viewed my earlier Python graphing lessons.

hist3

Now lets, do it with even more data points (100 elements from 1 to 10 to be exact)

hist4.jpg


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python

 

Python Fundamentals

*Note: This lesson was written using Python 2.xx. If you are using Python 3.xxx any changes to the code will be annotated under headings: Python 3.xxx

Open up a new Jupyter Notebook. Remember, when working with Jupyter Notebooks, you need to hit Shift+Enter to execute your code.

pythonInstall4

Arithmetic Operators

Python Arithmetic Operators are pretty standard. Take note of the division problem in line [9]. 10/7 is not 1, but by default Python only shows integer answers

pythonFundamentals1

You can work around it by defining the first number as a float()

pythonFundamentals2

Python 3.xxx

Python 3 handles division differently.  To get integer numbers only in division, use //.

10//7
1
10/7
1.4285714285714286

Arithmetic Operators

  • + : Addition
  • – : Subtraction
  • * : Multiplication
  • / : Division
  • % : Modulus (returns the remainder of a division problem: 5%2=1)
  • ** : Exponent (4**2 = 16)

Variables

Variables in Python are pretty straight forward. Unlike other programming languages, you do not need to define the variables first. Python dynamically assigns the data type.

Three main rules:

  • Variables must start with a letter or _
  • Variables are case sensitive
  • Avoid using command keywords (print, def, for)

pythonFundamentals3

notice lowercase ‘a‘ returns an error

Remember, Jupyter notebooks only return the last command. If you want both variables, use the print command

pythonfundaments4

Python 3.xxx

In Python 3, print statements require ().

print (A)
print (B)

You can perform arithmetic functions on variables

pythonfundaments5

Python 3.xxx

print (C)

And of course, variables can hold strings as well as numbers

pythonfundaments7

Python 3.xxx

print (E)
print (F)

 

8 Fun Facts About Python

  1. Python was named after the comedy troupe Monty Python. That is why you will often see spam and eggs used as variables in examples (a little tip of the hat to Monty Python’s Flying Circus)
  2. Python was created in 1991 by Guido Van Rossum
  3. There are Java and C variants of Python called JPython and CPython
  4. Python is an interpretive language, meaning you don’t need to compile it. This is great for making programs on the fly, but does make the code rather slow compared to compiled languages
  5. Python is part of the open source community, meaning plenty of independent programmers are out there building libraries and adding functionality to Python.
  6. It is one of the official languages at Google
  7. Ever work with Java or C? If so, one the first things you will notice is that Python has done away with braces. And it appears they even have a sense of humor about it. Look what happens when you try importing braces.pyfun
  8. Finally import this pyfun1.jpg

If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python

 

Python: Fun with Central Tendency

Now numpy provides easy functions for finding central tendency – you can find them in my Numpy Part II lesson: Python: Numpy Part II.

But we have learned enough about Python so far, that maybe it would be more fun to build our own functions. In this lesson we are going to build our own statistics library with mean, median, mode, and quantile

Our Data

pythonCent

Mean

pythonCent1.jpg

or even easier:

pythonCent3

Median

Remember with median – if your data contains an odd number of elements (n%2==1), you just take the middle value. However, if your data contains and even number of elements (n%2==0) then you add the two middle numbers and divide by 2.

We handle that through the use of an if statement in our function.

pythonCent2

Mode

For mode, we want to find the most common element in the list. For this task, I will import Counter from collections.

d.most_common() returns each unique element and the number of times it appears

d.most_common(1) returns the

pythonCent4.jpg

or in function form:

pythonCent5.jpg

In my numpy part II lesson I use a more elegant solution, but I want you to see that there is always more than one way to do something in Python.

Quantile

Quantiles are cut points in set of data. They can represent the bottom ten percent of the data or the top 75% or any % from 0 to 100.

In my solution, I am adding a slicing argument to the sorted() function. Below shows all elements up to (but not including) index 4

pythonCent6

quantile function:

pythonCent7


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python

Python: Pandas, Working with DataFrames

Sure DataFrames look nice, but how can I work with them?

Let’s cover some basic tasks in pandas to get you started.

Let’s start by building a DataFrame

pyDF1

I don’t like where they placed Age on my dataframe. I want to move it.

To do so, we are going to cover a couple of new terms: axis, drop() and insert()

Axis

Using numpy and pandas, you will come across many functions that require you to enter an axis as a parameter. Axis 0 is your rows while Axis 1 is your columns. This is due to the way matrices are named with a 3×2 matrix having 3 rows and 2 columns and a 2×3 having 2 rows and 3 columns

pyDF2

drop()

To move the age column, I am first going to create a copy of my dataframe minus the age column. To do this, I am going to use the drop() function. The drop() function accepts two arguments drop(name, axis). In our case name = ‘Age’ and axis = 1 since we are referring to a column.

pyDF3

insert()

Now we want to insert the age column. The syntax for the insert() function is insert(insert point, name, data)

pyDF4

add a new column

Adding a new column is straight forward. Just DataFrame[new column name] = value.

Below I created at new column called ‘Age When Start’ that shows the age of employees when they started. I derived this value by subtracting Years Service column from Age column.

pyDF5.jpg

boolean column

You can create a boolean column using a boolean operator.

pyDF6

sort()

You can sort a dataframe by any column using sort_values()

pyDF7.jpg

Sort is set to ascending by default. To reverse it, set ascending = False

** remember in Python, True and False need to start with a capital letter.

pyDF8.jpg

slicing

by rows

slicing by rows is just like with a list

pyDF9

by columns

slicing by columns is a bit more complex. To slice by column name you have to use the dataframe.ix command.

pyDF10


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python

 

Python: Numpy Part II

Beyond numpy’s usefulness in creating arrays and matrices, numpy also provides a great suite of math functions that – for anyone with any programming background – are fairly intuitive.

Here are some examples:

np.pi returns pi and np.sqrt() takes the square root of whatever value you feed it.

pyNp2.jpg

Trig Functions

numpy handles most common trigonometry functions.

pyNp2_1

Stats

Numpy handles many statistics functions.

Below we have mean and median. Unfortunately, just like in R, there is no mode command, but we can fake it using Set.

pyNp2_2

using set to fake mode

pyNp2_4

Numpy can also be used to find range, variance, and standard deviation.

pyNp2_5

Rounding

Numpy has rounding features for dealing with decimals. It also has floor() and ceil() functions that bring the number down to the “floor” or up to the “ceiling”

pyNp2_6

Use in creating graphs

use np.sin()

pyNp2_7.jpg

use np.log()

pyNp2_8

you can even put the two together

pyNp2_9

linspace()

My final one today is a function called linspace(). It lets you create a start and finish point, and how many elements you want. It then will create a even list of number between start and finish.

linspace(start,finish, num=numbers you want)

pyNp2_10


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python

 

 

Python: Create, Import, and Use a Module

Today we are going to cover what I consider a flaw in the iPython Notebook environment. Code you write cannot be imported into another program like you can using just a standard Python compiler. At least it cannot under its native .ipynb format.

There is a work around though. Starting jupyter notebooks up with the –script command results in your notebook being saved as both a ipynb file and a .py file. And .py is what we are looking for.

pyModu

note ‘–script‘ is being deprecated in favor of nbconvert. But as of this moment it still works and will work on older versions you have downloaded as well.

Create a Module

I am going to create a little module call add2. This module contains 3 simple functions: sqr2, sub2, and add2

pyModu1

You can see that now along with my add2.ipynb file, I have a add2.py file.

pyModu2

In my instance, my iPython notebooks are stored under my user directory. If you are not sure where you notebooks are, you can use the pwd command (print working directory)

pyModu3.jpg

** note the double \\ are due to a formatting function in Python. \ is a break command that can be used for many purposes. Example \t means to insert a tab.

pyModu4.jpg

So effectively you have to use \\ if you want a \ in Python, making my working directory C:\Users\Benjamin  

Import the Module

Now if I open up a new notebook and try using the functions I have just created, I will error out.

pyModu5.jpg

So, what I need to do is import add2, then I can start calling on the functions I created in add2

pyModu6


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python

Python: Error handling

No matter how well crafted your code, you will inevitably come across an error every once in a while. Unfortunately, Python’s default setting when catching an error is to crash the program.

In the example below, I accidentally asked for a letter, when the code int(raw_input()) is looking for an integer.

So when I enter a string, I get an errorpyerror

 

 

 

Try

Using Try, we can handle such mishaps in a more elegant fashion

The syntax is a follows:

try:
        Your Code
except: 
         What to do if your code errors out
else:
         What to do if your code is successful

pyerror1

And as you can see from above, it doesn’t just protect against code errors, but it protects against user errors as well.

Using a while loop, we can give the user another chance if they don’t enter a number the first time.

Code explanation

  • while True:  – starts an infinite loop
  • continue – returns to the beginning of the while loop
  • break – exits the loop

pyerror2

Finally

You can add the finally: command to the end of a try: statement if you have something you want to execute regardless of whether there was an error or not.

pyerror3.jpg

Exception types:

You can also determine what your program does based on the type of error

ValueError

pyerror4

ZeroDivisionError

pyerror5.jpg

No Error

pyerror6

A list of Exception types can be found here: Link


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python