Excel Solver: Intro Optimization Models: Linear

On May 23, 2016May 23, 2016 By Ben Larson Ph.D.In excel, Optimization Modeling3 Comments

What is an optimization model? An optimization model is a mathematical model designed to provide the optimal solution based on a information provided. You can use it answer questions like the optimal number of employees needed to working a shift or the optimal amount a product to produce that will bring in the most profit.

In this lesson we will be using Excel Solver ( a free ad-on) to develop an optimization model. If you do not have Solver loaded already, follow the instructions below:

Click File Button

Click Options

adOpt3

Click Add-Ins in the left column, find Solver Add-in. Click Go at the bottom next to Manage: Excel Add-ins

Check the box next to Solver Add-in – Click Ok and go back to main Excel page

adOpt5

To check that it installed, go to Data on the Ribbon and check for Solver in the Analysis box

Get the Excel File

You can make the file yourself, or download the template here: AdOptModel

Let’s look at our file:

This is a very simple example to start with. I am looking to run advertisements on a local radio station. The station provides two ad packages A and B. Package A is promised to reach 69,000 potential customers per airing and costs $156 dollars. Package B will reach 79,000 and cost $173.

I want to know how many of each package to run to reach the most customers. My only constraint is that I have a solid budget of $10,000 that I cannot exceed.

Now you do not have to use the colors, I just learn this way and it makes sense to me. To understand the coloring:

Blue = input cells – you need to provide this information to your model
Pink = changing cells. These are the boxes Solver is going to use to try to work the problem out – do not put any formula in these boxes
Gray = Result cell – you can only have one result cell.

Build the model

Now we are going to build the model. We need to add some calculations to our existing sheet for the model to work.

Step 1: This step is optional. I always place some number value in my changing cells just to make sure my formulas are inputted correctly.

Step 2: Constraint. Our budget is 10000, so we need a cell that calculates how much we are spending based on values in our change cells (Number of each ad we plan on running). To calculate this we are going to use a formula call Sumproduct()

What we want to do is multiply (# of ads for A * cost per Ad A) then add that to (# of ads for B * cost per Ad B)

That is what Sumproduct gives us (B5*B8)+(C5*C8)

We are going to do the same thing for our Result cell, except this time it will be Number of Ads * Customer Reach

Use Solver

Now it is time to use Solver, click on solver in the Data Tab of the Ribbon bar (under data analysis).

Let’s walk through this slow. In the top box, Set Objective, set this to the Result Cell. You can type it in or just click on the cell.

In the To: line, select Max since we are looking for the maximum customer reach here

Now set your Changing Variable Cells

To the right of big white box in the middle of the window, select Add

Place our constraint cells in this box.

Finally, make sure you check Make Unconstrained Variables Non-Negative and set our Solving Method to Simplex LP

Click Solve

Solver found a solution — Yay

Check – Keep Solver Solution

Hit okay

Here is our answer. Running 57.80347 Package B ads will get us 4566.474 Customers Reached.

But how do you run 0.8 of an ad?

Well, this is where we add in an integer constraint.

Reopen solver. Hit the Add button next to constraints

In Cell Reference, put our changing cells.

From the drop down pick “int”

Click Ok and run Solver again.

Now look at our new results: 2 of Package A and 56 of Package B

R: Filter Dataframes

On May 23, 2016May 23, 2016 By Ben Larson Ph.D.In RLeave a comment

When working with large data sets, often you only want to see a portion of the data. If you are running a query on the graduating class of a high school, you don’t want your results to be clogged up with hundreds of freshmen, sophomores, juniors. To accomplish this, we use filtering. Filtering a dataframe is a very useful skill to have in R. Luckily R makes this task relatively simple.

Download the data set here: boxplot2

Import the data

df <- read.csv(file.choose())
head(df)

Let’s take a look at it. We have a data set with 4 columns, workorderNo – Work Order Number, dif – change code since last work order reading, quart – quarter (referring to calendar year), Employee – employee name.

dataframe1

The first thing we need to know when dealing with dataframes is how to use the $

In R, the “$” placed between a dataframe name and a column name signify which column you want to work with.

head(df$Employee)
head(df$quart)

Notice the line that says Levels: after the Employee list. This lets you know this column is a factor. You can see all the unique elements (factors) in your list by using the levels() command.

level(df$Employee)

dataframeFilter2

Taking another look at our data, the quart column consists of 3 numbers (1,2,3) representing quarters of the year. If you think back to your statistics classes, these numbers are ordinal (meaning they only provide a ranking – you will never use them in a calculation – quart 2 + quart 3 has no real meaning.) So we should make this column a factor as well.

R makes that easy to do.

df$quart <- factor(df$quart)
head(df$quart)

dataframeFilter2

Okay, now onto filtering. Let’s see how easy it can be.

First we assign our filtering logic to a variable, then we place our variable inside the dataframe’s square brackets []

filter <- df$quart==2
df[filter,]

Or you can just put your logic directly in the dataframe’s square brackets[]

df[df$Employee=="Sally",]

Now we just see Sally

We can use an and “&” operator to filter by Sally and quart 1

df[df$Employee=="Sally" & df$quart == 1,]

R: Working with Dataframes

On May 23, 2016May 23, 2016 By Ben Larson Ph.D.In RLeave a comment

At first blush, a dataframe looks a lot like a matrix. The big difference between the two is that all columns in a matrix must contain the same data type. This is not so with dataframes.

Download the csv file here: boxplot2

Load the data in R

df <- read.csv(file.choose())
head(df)

read.csv() automatically loads the file data into a dataframe. So without having to do anything else, we have a dataframe.

You call on data in a dataframe just like you do with a matrix.

df[1,]
df[,1]

Using the structure function, str(), we get a look at how R configured our dataframe

str(df)

str(df) tells us we have 1861 objects of 4 variables. – this means 1861 rows with 4 columns each.

It then goes on to identify the columns and their data. Note the Employee column. Instead of an integer like the other columns, Employee is classified as a Factor with 5 levels – meaning there are 5 unique names in this data set.

summary(df)

summary() gives you basic stats (min, 1st Quantile, Mean, Median,…) on your numeric columns. But with the Employee column it gives you the unique names in column and a count of how often they are used.

Excel: Range Names

On May 23, 2016June 10, 2022 By Ben Larson Ph.D.In excel1 Comment

In Excel, you can name your cells and ranges of cells with more user friendly names. This comes in handy as your spreadsheets become more complex. It also comes in handy when you are sharing spreadsheets between co-workers.

File download available here: checkbook

This file is simple enough. I have an example checkbook with monthly salary and monthly expenses listed.

Select cell containing the monthly salary(B3). Now go up to the cell name box (circled in red)

rangeName2

Type a new name in the box – I used “Income”

You can do the same thing with a range. Select cells D3- D5, then rename it “Expenses”

rangeName4

If you check the drop down from the name box, you will see your new named cells in the drop down.

rangeName5

Let’s uses our new named cells (and ranges) to make a quick formula.

rangeName6

To see a video of this lesson, click the link here: Excel: Name Ranges

Python: K Means Cluster

On May 21, 2016May 21, 2016 By Ben Larson Ph.D.In machine learning, Python6 Comments

K Means Cluster will be our introduction to Unsupervised Machine Learning. What is Unsupervised Machine Learning exactly? Well, the simplest explanation I can offer is that unlike supervised where our data set contains a result, unsupervised does not.

Think of a simple regression where I have the square footage and selling prices (result) of 100 houses. Taking that data, I can easily create a prediction model that will predict the selling price of a house based off of square footage. – This is supervised machine learning

Now, take a data set containing 100 houses with the following data: square footage, house style, garage/no garage, but no selling price. We can’t create a prediction model since we have no knowledge of prices, but we can group the houses together based on commonalities. These groupings (clusters) can be used to gain knowledge of your data set.

I think seeing it in action will help.

If you want to play along, download the data set here: KMeans1

The data set contains a 1 year repair history of 197 Ultrasound medical devices.

Data dictionary (ID Tag – asset number assigned device, Model – model name of device, WO Count – count of repair work orders, AVG Labor – average labor minutes per repair, Labor Cost – average labor cost per repair, No Problem- count of repairs where no problem was found, Avg Cost -average cost of parts, Travel – average travel hours per repair, Travel Cost – average travel cost per repair, Department – department that owns the ultrasound device)

kmeans

We want to see what kind of information we can extract from this data.

To do so, we are going to use K Means Clustering.

How does K Means Clustering work? Each row in the table is converted to a vector. Imagine the vectors now graphed in N-dimension space. Next pick the number of clusters you want to create. For each cluster, you will place a point(a centroid) in space and the vectors are grouped based on their proximity to their nearest centroid.

The calculation to tell proximity is made using geometric means (not arithmetic)- hence the name K-Means Cluster

(each dot below is a row in your table, the colors represent a cluster)

kmeans2

Let’s do it in Python

Import the data.

import pandas as pd

df = pd.read_excel("C:\Users\Benjamin\Documents\KMeans1.xlsx")
df.head()

kmeans1

Now, we are going to drop a few columns: ID Tag – is a random number, has no value in clustering. Then Model and Department,as they are text and while there are ways to work with the text, it is more complicated so for now, we are just going to drop the columns

df1 = df.drop(["ID Tag", "Model", "Department"], axis = 1)
df1.head()

kmeans2

Now lets import KMeans from sklearn.cluster

We then initialize KMeans (n_clusters= 4 -no of clusters you want, init=’k-means++’ -sets how the centroids are places. k-means++ is one of the faster methods of centroid placement, n_init=10 – number times the algorithm with run placing new centroids each iteration)

from sklearn.cluster import KMeans
km = KMeans(n_clusters=4, init='k-means++', n_init=10)

Choosing number of clusters is a bit of an art. Play with it a bit and see how different values play out for you.

Now fit the model

km.fit(df1)

Now, export the cluster identifiers to a list. Notice my values are 0 -3. One value for each cluster.

x = km.fit_predict(df1)
x

Create a new column on the original dataframe called Cluster and place your results (x) in that column

df["Cluster"]= x
df.head()

Sort your dataframe by cluster

df1 = df.sort(['Cluster'])
df1

Now as you start to examine the data in each cluster, you show start to see patterns emerge.

Below is an example of the patterns I found in the clusters.

Now remember, this is just an INTRODUCTION to unsupervised learning. We will learn more tricks to help you discover the patterns as we move forward.

Python: K Nearest Neighbor

On May 20, 2016May 20, 2016 By Ben Larson Ph.D.In machine learning, PythonLeave a comment

K Nearest Neighbor (Knn) is a classification algorithm. It falls under the category of supervised machine learning. It is supervised machine learning because the data set we are using to “train” with contains results (outcomes). It is easier to show you what I mean.

Here is our training set: logi

Let’s import our set into Python

This data set contains 42 student test score (Score) and whether or not they were accepted (Accepted) in a college program. It is the presence of the Accepted column that makes supervised machine learning possible. Knowing the outcomes of past events, we can create a prediction model for future events. So you could use the finished model to predict whether someone will be accepted based on their test score.

So how does Knn work?

Look at the chart below. Imagine this represents our data set. Each blue dot is accepted (1) while each red dot is not(0).

knn1

What if I want to know about my new data point (green star)? Is it a 1 or a 0?

knn2

I start by choosing a neighbor count – in this example I will choose 3, and I find the 3 nearest neighbors to my new point.

Let’s look at the results, I have 2 red(0) and 1 blue(1). Using basic probability, I am 67% (2/3) certain that you will not get in.

Now, let’s code it!

First we need to separate our data into 2 dataframes: Our training set X (Score) and our target set y (Accepted)

df.pop() removes the Accepted column from your dataframe and places it in a newly created one.

Import sklearn

sklearn is a massive library of machine learning algorithms available for Python. Today we are going to use KNeighborsClassfier

So below imported KNeighborsClassifier from sklearn.neighbors

Next I set my neighbor count to 5. You can experiment with other numbers and see how works out for you. Setting the neighbor count is something you kind of have to develop a feel for.

Now let’s fit the model with our training set(X) and target set(y)

knn7

Now we can use our model to make predictions.

ne.predict() will return 1 or 0 – (Accepted or Not)

while ne.predict_proba() will return a probability range. Results below read as (40% change of not Accepted(0), 60% chance of Accepted(1))

So there you go, you have now built a prediction model using K Nearest Neighbor.

R: Graphing with matplot()

On May 20, 2016May 20, 2016 By Ben Larson Ph.D.In R2 Comments

matplot() is a R function that you can use to make easy graphs in R.

Let’s start by creating a data set.

We are going to create a 5 x 10 matrix representing 10 quiz and test grades for 5 students

x <- sample(50:100,50, T)
x
# convert to a 5x10 matrix
A <- matrix(x,5,10)
A

matplot

Let’s make name our rows and columns

#name and assignments vectors
names <- c("Sara", "Jill", "Jared", "Kim", "Don")
assignments <- c("Quiz 1", "Quiz 2", "Test 1", "Quiz 3", "Quiz 4", "Midterm", "Quiz 5", "Quiz 6", "Test 2","Final")
#assign labels to columns and rows
row.names(A) <- names
colnames(A) <- assignments
A

Let’s graph it

Start simple

matplot(A)

Note our X axis goes to 5 – we are graphing students against grades.

What if I want to graph assignments against grades? Simple, transpose the matrix

matplot(t(A))

matplot3

Okay, but the numbers are distracting. Let’s do something about that.

matplot(t(A), type="b")

matplot4

I don’t like the numbers

# replace numbers with shapes
matplot(t(A), type="b", pch=15:19)

pch 15:19 lets us rotate through 4 shapes (15 – 19) – try different numbers on your own

I am adding a color element next. You will see the need for this in our next section.

 matplot(t(A), type="b", pch=15:18, col=c(1:5))

This has no current effect on our graph

Legend

Let’s add a legend

legend("bottomleft", inset=0.01, legend=names, col=c(1:5),pch=15:19,
bg= ("white"), horiz=F)

okay, here is the syntax: legend(location, padding, legend data, colors, symbols, background, horizontal vs vertical layout)

This is why I added the color element to our matplot, so the colors in the legend would match up to the colors in my graph.

R: Matrix Operations

On May 19, 2016May 19, 2016 By Ben Larson Ph.D.In R3 Comments

Imagine the following problem: You have a business assembling cabinets. You have 10 employees and you want to see who is the most efficient. You track the number of cabinets assembled by each employee over 10 days. You also track the hours they worked each day in a separate table.

Let’s create our tables –

Number of Cabinets

Create a vector of 100 random data points from 50 to 100.

# created random set of 100 numbers between 50 and 100
x <- sample(50:100,100, T)
x

sample(range,number of data, allow repeat of data (T,F))

Convert to a matrix

# convert to a 10x10 matrix
A <- matrix(x,10,10)
A

Name Columns and Rows

#name rows and columns
colnames(A) <- c("D1","D2","D3","D4","D5","D6","D7","D8","D9","D10")
row.names(A) <- c("Bob","Steve","Gary","Sara","Tony","Stacey","Kerri","Debbie","George","Manny")
A

Hours Worked

Let’s do the same thing, but now build our hours table.

# created random set of 100 numbers between 1 and 9
y <- sample(1:9,100, T)
y

Convert to 10×10 matrix form

# convert to a 10x10 matrix
B <- matrix(y,10,10)
B

Name our columns and rows

#name rows and columns
colnames(B) <- c("D1","D2","D3","D4","D5","D6","D7","D8","D9","D10")
row.names(B) <- c("Bob","Steve","Gary","Sara","Tony","Stacey","Kerri","Debbie","George","Manny")
B

Matrix Operations

Now we will see the magic of R. This programming language was built for Statisticians. So it has some great matrix operations that make linear algebra operations a breeze. In fact, if you ever took linear algebra, you will wish you had known about R back then.

Now if I want to know how many cabinets each person made per hour worked on any given day, I would need to divide each elements of matrix A by its corresponding element in matrix B.

Taking what you know about For loops, think for a minute what the loop would look like to complete this task. Messy huh?

Well in R, we don’t have to worry it. Because in a R, A/B does just what we are asking for.

#matrix division
C = A/B
C

The results are bit much, let’s round it to 1 decimal place

#round Matrix
round(C, digits =1)

Okay, now let’s find the most and least productive values

#most and least
max(C)
min(C)

So we have one person who made 97 cabinets in an hour and another who made 6.875

Who are they? And what day was it?

#who are they?
which(C==max(C), arr.ind=T)
which(C==min(C), arr.ind=T)

Manny is our hero on day 10, and Sara must have been hung over on day 10

matrixOps9

What about a mean?

I know, I know, if you are here, then you like me are a stat geek. So let’s run the mean and see who is the overall Rockstar.

#get the means
colMeans(C)
rowMeans(C)

I have underlined highest and lowest, both day and employee. Looks like Sara is a real slacker.

Now before you start sending hate mail and calling me sexist, remember these numbers were randomly generated. If I ran my code again, I would get completely different answers. Just as if you follow along, your answers will not match up with mine.

The Code

# created random set of 100 numbers between 50 and 100
x <- sample(50:100,100, T)
x
# convert to a 10x10 matrix
A <- matrix(x,10,10)
A
#name rows and columns
colnames(A) <- c("D1","D2","D3","D4","D5","D6","D7","D8","D9","D10")
row.names(A) <- c("Bob","Steve","Gary","Sara","Tony","Stacey","Kerri","Debbie","George","Manny")
A

# created random set of 100 numbers between 1 and 9
y <- sample(1:9,100, T)
y
# convert to a 10x10 matrix
B <- matrix(y,10,10)
B
#name rows and columns
colnames(B) <- c("D1","D2","D3","D4","D5","D6","D7","D8","D9","D10")
row.names(B) <- c("Bob","Steve","Gary","Sara","Tony","Stacey","Kerri","Debbie","George","Manny")
B

#matrix division
C = A/B
C

#round Matrix
round(C, digits =1)

#most and least productive values
max(C)
min(C)

#who are they?
which(C==max(C), arr.ind=T)
which(C==min(C), arr.ind=T)

#get the means
colMeans(C)
rowMeans(C)

Python: Object Oriented Programming

On May 18, 2016May 18, 2016 By Ben Larson Ph.D.In Python2 Comments

In the argument between R and Python, the fact the Python is a full blown Object Oriented Programming(OOP) language gives it a solid advantage to me. Why? OOP gives you the ability to create and use objects in your programs.

Now I remember my first OOP class was in Java. It was a college course and started with reading 4 dense chapters on what made a language OOP. It talked about Objects and methods and data encapsulation. In the end, I just had a headache, and I really didn’t understand anything until I built my first Object.

So, let’s just jump right in and build our first object.

Class

The first step in building an object is to build a class. Now I am going to move quick here, and then work my way back to better explain what we are doing.

So, I am building a 2 function calculator here.

class myCalc: – First I create my class and name it
def __init__(self): — this is called a constructor – you need to have one, but you don’t need to use it. Notice I have the word pass on the next line. That means just move on without doing anything. Don’t worry constructors for now. We will cover constructors in a later lesson.
def myAdd(self, x,y): this is basically a function inside my class, but inside a class we call functions methods. The first argument (self) is a reference to our constructor. Again, don’t worry about it yet, just know it has to be there or you will get an error
def mySub(self,x,y): Same as above

Create our Object

By assigning our class to a variable name, what we doing is creating an Instance

OOP1

Now that I created and Instance can call my Methods

Notice when I call myAdd and mySub, I only provide 2 arguments. Like I said earlier, we aren’t using the self argument in this case, so we don’t need to pass any argument to it.

OOP2

Okay, so what is the big deal you are asking? I mean, all I did was show you a more complicated way to make a function.

Why Bother

So I am going to attempt to show you a practical use for a class without resorting to totally impractical examples or overly complicated ones.

myExp works by asking you for a number when you create an Instance. Then when you call the exp1() function, it raises what ever value you give exp1() by the number you started with.

examples:

ms = myExp(2) – self.x is set to 2 in the __init__ method
ms.exp1(3) – 3 ** 2
ms.exp1(4) – 4 **2
ms1 = myExp(3) – self.x is set to 3
ms1.exp1(3) – 3**3
ms1.exp1(4) – 4**3
ms.exp1(3) – 3** 2 — this is the cool part. –self.x for my ms object is still 2 and self.x for ms1 is 3

Here it is in running code:

OOP4

I can create at many Instances as I want, giving each instance whatever value of self.x I want. And I can reuse them in any fashion I can imagine.

Here I pass one Instance as an argument to another.

It is the re usability and the ability to assign different values to objects that makes them so useful.

This is just an intro

Don’t worry if you are little confused. This is just an intro and I will create more OOP lessons to hopefully help clear everything up.

Python: Create a Box whisker plot

On May 17, 2016May 17, 2016 By Ben Larson Ph.D.In Python1 Comment

Box whisker plots are used in stats to graphically view the spread of a data set, as well as to compare data sets.

If you would like to follow along with this example, he is the data set: sensors

Using pandas, let’s load the data set

%matplotlib inline
import pandas as pd
import matplotlib as mp
import matplotlib.pyplot as plt

sensorDF = pd.read_excel("C:\Users\Benjamin\Documents\sensors.xlsx")
sensorDF.head()

Our data set represents monthly readings taken from 4 sensors over the span of a year

boxplot

We need to convert the dataframe to a list values for our box plot function.

To do this, first we need to flatten() our dataframe. The flatten() method places all the values from the dataframe into 1 list

Now let us chop the list into the for sensors represented by the rows in our dataframe

Finally, we need to make a list of these lists

I know that seemed like a lot, but you will spend more time cleaning and prepping data than any other task. It is just the nature of the job.

Let’s Plot

The code for creating a boxplot is now easy.

Let’s label our chart a little better now.

	Anonymous on Python: Accessing a SQL databa…
	Anonymous on Top 7 skills a Data Analyst ha…
	lovingfox4e1d0e653e on Data Jobs: What does a Data An…
	Anonymous on Top 7 skills a Data Analyst ha…
	Anonymous on Python Web Scraping / Automati…

	Anonymous on Python: Accessing a SQL databa…
	Anonymous on Top 7 skills a Data Analyst ha…
	lovingfox4e1d0e653e on Data Jobs: What does a Data An…
	Anonymous on Top 7 skills a Data Analyst ha…
	Anonymous on Python Web Scraping / Automati…

	Anonymous on Python: Accessing a SQL databa…
	Anonymous on Top 7 skills a Data Analyst ha…
	lovingfox4e1d0e653e on Data Jobs: What does a Data An…
	Anonymous on Top 7 skills a Data Analyst ha…
	Anonymous on Python Web Scraping / Automati…

	Anonymous on Python: Accessing a SQL databa…
	Anonymous on Top 7 skills a Data Analyst ha…
	lovingfox4e1d0e653e on Data Jobs: What does a Data An…
	Anonymous on Top 7 skills a Data Analyst ha…
	Anonymous on Python Web Scraping / Automati…

	Anonymous on Python: Accessing a SQL databa…
	Anonymous on Top 7 skills a Data Analyst ha…
	lovingfox4e1d0e653e on Data Jobs: What does a Data An…
	Anonymous on Top 7 skills a Data Analyst ha…
	Anonymous on Python Web Scraping / Automati…

Analytics4All

Month: May 2016