Python: An Interesting Problem with Pandas

I was writing a little tongue and cheek article for LinkedIn on fraud detection using frequency distributions (you can read the article here: LinkedIn). While this was a non-technical article, I wanted to use some histograms from a real data set, so I uploaded a spread sheet into Python and went to work.

While working with the data I ran into an interesting problem that had me chasing my tail for about 10 minutes before I figured it out. It is a fun little problem involving Series and Dataframes.

As always, you can upload the data set here: FraudCheck1

Upload the data.

import pandas as pd
df = pd.read_excel
("C:\\Users\\Benjamin\\OneDrive\\Documents\\article\\python\\FraudCheck1.xlsx")
df.head()

pandasProb.jpg

The data is pretty simple here. We are concerned with our answer column and the CreatedBy (which is the employee ID).  What I am trying to do is see if the “answer”  (a reading from an electric meter) are really random or if they have been contrived by someone trying to fake the data.

First, I want to get the readings for all the employees, so I used pop() to place the answer column into a separate list.

df1 = df

y = df1.pop("answer")

pandasProb1.jpg

Then, to make my histogram more pleasant looking, I decided to only use the last digit before the decimal. That way I will have 10 bars (0-9). (Remember, this is solely for making charts for an article. So I was not concerned with any more stringent methods of normalization)

What I am doing below is int(199.7%10). Remember % is the modulus – leaves you with the remainder and int converts your float to an integers. So 199.7 is cut to 199. The 199/10 remainder = 9.

a= []
i = 0 
while i < len(y):
     a.append(int(y[i]%10))
     i += 1
a[1:10]

pandasProb2

Then I created my histogram.

%matplotlib inline
from matplotlib import pyplot as plt
plt.hist(a)

pandasProb3.jpg

Now my problem

Now I want graph only the answers from employee 619, so first I filter out all rows but the ones for employee 619.

df2 = df.query('CreatedBy == 619')
y1 =df2.pop("answer")

Then I ran my loop to turn my answers into a single digit.

And I get an error.  Why?

pandasProb4.jpg

Well the answer lies in the datatypes we are working with. Pandas read_excel function creates a Dataframe.

When you pop a column from a dataframe, you end up with a Series. And remember a series is an indexed listing of values.

Let’s look at our Series. Check out the point my line is pointing to below. Notice how my index jumps from 31 to 62. My while loop counts by 1, so after 31, I went looking for y1[32] and it doesn’t exist.

pandasProb5.jpg

Using .tolist() converts our Series to a list and now our while loop works.

pandasProb6.jpg

And now we can build another histogram.

pandasProb7.jpg

The Code

import pandas as pd
df = pd.read_excel
("C:\\Users\\Benjamin\\OneDrive\\Documents\\article\\python\\FraudCheck1.xlsx")
df.head()

df1 = df
y =df1.pop("answer")

a= []
i = 0 
while i < len(y):
   a.append(int(y[i]%10))
   i += 1
a[1:10]

%matplotlib inline
from matplotlib import pyplot as plta1 = []
i = 0
while i < len(y2):
 a1.append(int(y2[i])%10)
 i = i+1
a1[1:10]

plt.hist(a)

df2 = df.query('CreatedBy == 619')
y1 =df2.pop("answer")

y2= y1.tolist()
type(y2)

a1 = []
i = 0
while i < len(y2):
    a1.append(int(y2[i])%10)
    i = i+1
a1[1:10]

plt.hist(a1)

 

Advertisements

Qlik Sense: A first Impression

So I am trying out Qlik Sense as a possible BI option for work. I have some limited experience (through grad school) working with another one of their products Qlik View. I will say right from the get go, Qlik Sense is much more user friendly.

Opening it up, I get the following splash screen

qlikSense.jpg

Qlik then asks if I would like to Create a New App.qlikSense1.jpg

I click Create A New APP and I get a window to name it.

qlikSense2.jpg

It says New app created, and asks if I want to open my app

qlikSense3.jpg

Now I am brought to a page to add my data. I have two options, one a more GUI driven method, and the other that will allow me to create my data model using Qlik’s script (very similar to SQL).

I am going to go with the GUI method for now.

qlikSense4.jpg

I choose Excel files

qlikSense5

Next I select a file QlikTEst.xlsx that I put together for this test drive.

qlikSense6.jpg

My file appears in preview mode. Notice I am able to select both sheets to load and unselect columns I don’t want.

qlikSens7.jpg

I click Prepare Data at the bottom and I am brought to this screen. Notice the red triangle warning on the bottom of the screen. It says I have possible associations.

qlikSens8.jpg

I click the warning and I now have a recommendation for an association to make between sheets. My Equipmentkey column matches up(53%) on both sheets.

qlikSens9

Just double click the Recommendations box and your pages are now joined

qlikSens11

Now hit Load data

qlikSens12.jpg

Qlik now brings you to a work sheet.

qlikSens13.jpg

I want to create a bar chart, so I drag one in from the menu column.

qlikSens14.jpg

Now I pick my Month as my dimension

qlikSens15.jpg

And I choose WorkOrderNo as my measure

qlikSens16.jpg

Now I choose an Aggregate of Count()

qlikSens17.jpg

And I have a bar chart.

qlikSens18.jpg

Now, let’s add a filter – drag it in from the menu column

qlikSens20

Select FacilityCode as my dimension

qlikSens21.jpg

I can adjust the filter size by clicking on these points

qlikSens22.jpg

Now hit done at the top of the screen

qlikSens23.jpg

And I now have a live interactive bar chart.

qlikSens24.jpg

My Thoughts on Qlik Sense

Qlik Sense is insanely easy to get up an running. You really don’t need to know much more than how to drag and drop. However, if you are like me and enjoy getting under the hood, Qlik does allow you custom code your data models and formulas.

My only complaint, and this could be because I am new and don’t know a work around, is having to hit Done to leave the edit screen before I can test the interactivity of my charts.

 

R: Graphing with matplot()

matplot() is a R function that you can use to make easy graphs in R.

Let’s start by creating a data set.

We are going to create a 5 x 10 matrix representing 10 quiz and test grades for 5 students

x <- sample(50:100,50, T)
x
# convert to a 5x10 matrix
A <- matrix(x,5,10)
A

matplot

Let’s make name our rows and columns

#name and assignments vectors
names <- c("Sara", "Jill", "Jared", "Kim", "Don")
assignments <- c("Quiz 1", "Quiz 2", "Test 1", "Quiz 3", "Quiz 4", "Midterm", "Quiz 5", "Quiz 6", "Test 2","Final")
#assign labels to columns and rows
row.names(A) <- names
colnames(A) <- assignments
A

matplot1.jpg

Let’s graph it

Start simple

matplot(A)

matplot2.jpg

Note our X axis goes to 5 – we are graphing students against grades.

What if I want to graph assignments against grades? Simple, transpose the matrix

matplot(t(A))

matplot3

Okay, but the numbers are distracting. Let’s do something about that.

matplot(t(A), type="b")

matplot4

I don’t like the numbers

# replace numbers with shapes
matplot(t(A), type="b", pch=15:19)

pch 15:19 lets us rotate through 4 shapes (15 – 19) – try different numbers on your own

matplot5.jpg

I am adding a color element next. You will see the need for this in our next section.

 matplot(t(A), type="b", pch=15:18, col=c(1:5))

This has no current effect on our graph

Legend

Let’s add a legend

legend("bottomleft", inset=0.01, legend=names, col=c(1:5),pch=15:19,
bg= ("white"), horiz=F)

okay, here is the syntax: legend(location, padding, legend data, colors, symbols, background, horizontal vs vertical layout)

This is why I added the color element to our matplot, so the colors in the legend would match up to the colors in my graph.

matplot6.jpg

 

 

 

Excel: Scatter Plots

Download worksheet here: ExcelCharts2

Our data set contains 3 sets of readings for 4 sensors (A,B,C,D)

excelCharts1

Scatter Plot

Reading 1

Highlight Columns A and B – From Ribbon > Insert>Scatter

excelCharts2.jpg

Here is a close up of the Scatter Plot icon

excelCharts3.jpg

Here is our plotting of Reading 1

excelCharts4

Reading 2

To add the Reading 2 column to the plot, right click on the chart area and Select Data

excelCharts5

Select Add

excelCharts6

  • Select the Heading from Column C for Series Name
  • A2:A5 as Series X values
  • C2:C5 as Series Y values

excelCharts7.jpg

Reading 3

Repeat again for Reading 3

excelCharts8

Now, double click on the Y-Axis and go to the Format Axis box that will appear on your right.

Select the Bar Chart Icon, and change the Axis Bounds to minimum of 40 and max of 100.

excelCharts9.jpg

This will help make the spacing between dots more pronounced. Generally altering a Y Axis away from 0 is considered bad taste as it tends to over-pronounce differences between elements.

excelCharts10

Now click on X-Axis and make the changes below.

excelCharts11

Next, go to Add Chart Element in the upper right corner. Legend>Right

excelCharts12

Here is our scatter plot

excelCharts13

Now wait a minute. I know what you are thinking, why is my X Axis 1 – 4 and not A,B,C,D

This is a flaw in Excel. There are some third party packages you can install that will allow you to rename the X Axis in a XY Scatter plot, but not with Excel in its default state.

You can however change the chart type to a Line Chart.

Go to Ribbon > Change Chart Type

Select Line

excelCharts14

Now your X-Axis is properly labeled

excelCharts15.jpg

 

 

 

Python: Line Graph

Let’s build on what we learned in : Python: Intro to Graphs

First, import pyplot from matplotlib

Remember %matplotlib inline let’s you see your graphs in the jupyter notebooks

pythonLine

Line 1

Note I am using a list comprehension to fill the x axis.

pythonLine1

Now, let us do some formatting:

  • ‘r:’    –   red dotted line
  • label = ‘up’   –  adds label to the line
  • py.legend(loc=9)  – adds a legend to the chart

pythonLine2

Line 2

For line 2, I want to create a reverse curve:

  • d = u  – list u is copied to d
  • d.reverse() – reverses values in d

pythonLine3

Plot the line.

  • ‘g-.’ – green dashed line

pythonLine4

Combine the two lines

pythonLine5

Line 3

Here I use another list comprehension  and zip to make our last like. This adds each element of list u and d in order.

  • ‘b-‘ – blue solid line

pythonLine6


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python

Excel: Intro to Charts

Excel makes building basic charts easy.

Download sample data set here: ExcelCharts1

The Data

Our sample data set here is simple enough. We have 5 sensors (A,B,C,D,E) and one reading from each sensor.

excelChart

Column Chart

Let’s start with a column chart. To create a column chart, first select (highlight) your data set and then go to Insert>Column Chart in the Ribbon Bar.

excelChart1

From the column chart options, I am going to pick the first 2D chart option

excelChart2

excelChart3

Once a chart is created, you will notice a new element – Chart Tools – in the Ribbon Bar.

The first thing I want you to explore is Chart Styles. Click on the different options and see what happens.

excelChart4

Next, let’s try changing the colors:

excelChart5.jpg

Now go to Add Chart Element

Go to Axis Titles>Primary Horizontal

excelChart6

Now click on the Axis Title and rename it.

excelChart7

Label this Axis “Sensors”

excelChart8

Now select Primary Vertical and name it “Readings”

excelChart9

Now to change the chart title. Click on the title:

excelChart10

Rename it “Sensor Readings”

excelChart11


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Excel content: Excel

Python: Intro to Graphs

Visualizations are big part of analytics. You will need to produce visually engaging graphics for presentations, reports, and dashboards. You will also make graphs for your own use in data discovery and analysis. As bonus, unlike data cleaning, data viz can be pretty fun.

Matplotlab and Pyplot

Matplotlab is a module you can import into Python that will help you to build some basic graphs and charts. Pyplot is part of Matplotlab and the part we will be using in the following example.

**If you are using the Anaconda Python distribution, Matplotlab is already installed. If not, you may need to download it from another source.

Line Graph

Syntax

  • %matplotlib inline – this code allows you to view your graphs inside jupyter notebooks
  • from matplotlib import pyplot as plt – here we import pyplot from matplotlib into our program (note, we only want pyplot not all the functions in matplotlib).Adding “as plt” gives us a shorter alias to work with
  • age and height lines – fill our lists with age and height information for an individual
  • plt.plot(age, height, color = ‘blue’) – here we tell Python to plot age against height and to color our line blue
  • plt.show() – prints out our graph

pythonGraphs

Bar Chart

For this example, we will make a bar charting showing ages of 4 people.

Syntax

  • You should understand the first few lines from the first example
  • count_names = [i for i,_ in enumerate(name)]  – since the name list is a list of strings, we cannot really graph that onto a chart. We need a way to convert these strings into numbers.

Wait? What does for i,_ mean? Let’s jump to the next code sample

pythonGraphs1

While you don’t see it when making the list, a Python list is technically a list of tuples (index number, element). So if instead of i,_ we asked for both elements in the tuple, (i,j) we would get the following.

pythonGraphs2

So by iterating by for i,_ we only return the first element in the tuple (the index)

** notice we are using a list comprehension. If you are unfamiliar with list comprehensions, check out my earlier post: Python: List Comprehension

Let’s clean up our bar chart a little now.

  • plt.ylabel(‘Age’) – label the y-axis Age
  • plt.title(‘Age of People’) – give the graph a title
  • plt.xticks([i+0.5 for i,_ in enumerate(name)], name) – this label function is using a list comprehension to first chose the position on the X-axis, and name provides the person’s name for the label.

pythonGraphs3.jpg


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT

Follow this link for more Python content: Python

Intro to Tableau: Line Chart: 3 or More Measures


 Note:

If you do not currently have Tableau, you can download a free version at: https://public.tableau.com/s/

Downloads:

Download Practice Excel File Here: School Lunch



 

Line Chart: 3 or More Measures

This lesson is a continuation of an earlier lesson. If you are already familiar with Tableau, feel free to continue on. Otherwise, check out my first Tableau lesson: Line and Bar Charts

If you want to add 3 or more measures to a line chart, you need to take a different approach than in regular charts.

Import the Data

Select Excel from the Connect menu and select the school lunch excel file you have downloaded.

If you are continuing on from the Line and Bar Charts lesson, you can skip this step, your data is already loaded.

tableauIntro1

Create a New Worksheet

Click the New Worksheet icon found on the bottom of your screen.

tableauIntro3

Drag Year to Columns and Measure Values to Rows

tableau3ormorel1

  1. Get rid of Sum(Number of Records) by dragging it back into Measures
  2. While holding down Ctrl drag Measure Names from the Dimensions slot to Color

tableau3ormorel2

And there you have it. 3 Measures in one chart.

tableau3ormorel3

 

 

Intro to Tableau: Dual Axis Charts


 Note:

If you do not currently have Tableau, you can download a free version at: https://public.tableau.com/s/

Downloads:

Download Practice Excel File Here: School Lunch


 

Dual Axis Charts

This lesson is a continuation of an earlier lesson. If you are already familiar with Tableau, feel free to continue on. Otherwise, check out my first Tableau lesson: Line and Bar Charts

Import the Data

Select Excel from the Connect menu and select the school lunch excel file you have downloaded.

If you are continuing on from the Line and Bar Charts lesson, you can skip this step, your data is already loaded.

tableauIntro1

Create a New Worksheet

Click the New Worksheet icon found on the bottom of your screen.

tableauDual1

Drag Year from Dimensions and Free from Measures into Columns and Rows respectively. You should now have a line chart. (if not, refer to Lesson 1 for troubleshooting tips)

tableauintro4

Now, drag Full Price into Rows. You should now notice you have two graphs. Free up top and Full Price on the bottom.

tableaudual2

Now you could just stop there. You do have both Measures graphed. But this really isn’t the best way analyze this data. It is hard to do a good comparison this way.

Dual Axis

For better analysis, we are going to create a Dual Axis Chart.

Right click on the Y Axis of the bottom chart and select Dual axis

tableaudual3

Now you have both measures on one graph.

tableaudual4

If you look closely at the Left and Right Y-Axis’s, you will notice they are not the same. This could skew how someone would interpret this data.

To fix this, right click on the Right Y-Axis and select Synchronize axis

tableaudual5

Finally, since both of your Y-Axis match up, you don’t need them both. Right click on the Right Axis again and uncheck Show header.

Previous > Lesson 1

Next > 3 or More Measures