Excel: Copy, Cut, Paste, and Format Painter

This is a very basic introduction into Excel. I am going to start with the upper left hand corner of the Ribbon bar: the Clipboard region.

2016-12-18_15-35-08.jpg

While I am aware most of you already know how to use these features, my website is for everyone, including the most base beginner. I remember the frustration of learning the fundamentals of analytics from websites that assumed I already had a PhD and 10 years of work experience in the area.

So feel free to skip this, or take a few minutes to read through, you might be surprised. There may be some tricks in this little corner of Excel you were not aware of.

Cut, Copy, and Past

These 3 features are ambiguous with computer use dating back to days DOS. As a matter of fact, the old keyboard shortcuts used back then still work.

Cut (Ctrl-X) – deletes the highlighted text and stores it in local memory (a clipboard)

Copy (Ctrl-C) – leaves highlighted text as is, but saves a copy of it into local memory

Paste(Ctrl-V) – pastes the contents of the clipboard into the spot you have chosen.

Below are two examples

CUT

Start by highlight the rows A:1 -A:4, click Cut (or Ctrl-X). Now Select Cell C1. Click Paste (or Ctrl-V). Notice column A is now empty.

COPY

Start by highlight the rows A:1 -A:4, click Copy (or Ctrl-C). Now Select Cell C1. Click Paste (or Ctrl-V). Notice Column C is now a Copy of Column A

Clipboard

If you click on the bottom right of the Clipboard box, the clipboard window opens up, showing your the current contents saved to the clipboard

2016-12-18_17-37-39

So now if I add some letters to column B and copy it, that will end up in the Clipboard as well. Notice the original data is still in the clipboard

2016-12-18_17-42-08.jpg

Now when you want to paste, you can choose which item in the clipboard to paste. Without using the clipboard, Excel will paste the most recent item added to the clipboard by default.

2016-12-18_17-40-38.jpg

Copy as Picture

You might notice a drop down arrow next to Copy in the Ribbon Bar. If you click on it, you will see Copy as Picture as an option. This is great when working with Charts. This saves the data as a picture or bitmap so when you paste it elsewhere it will not be affected by changes to the source data.

What I mean by that is, looking at the chart below, column b has a value of 2. If I change that 2 to a 4 in the data table this chart was created from the bar representing b would change to 4 (so would any copied charts). But a chart copied “as picture” would not change. Imagine it to be like a screen shot.

2016-12-18_17-47-52.jpg

Paste Special

Notice the options below Paste when you hit the drop down arrow. Some of the more popularly used are Transpose (turns a vertical list to horizontal list and vise versa). Another popular option is to paste “values”. This is useful when trying to copy a calculated value where all you want is the number (not the formula who made it).

2016-12-18_20-34-34.jpg

Format Painter

Format Painter can help you repeat text and color formatting with just a few clicks

Start by adding color and font bolding to a set of cells. Highlight those cells and click Format Painter

2016-12-18_20-40-38

You will now see your cursor is paint brush. Find a target you want to duplicate your formatting to and click on it.

2016-12-18_20-41-03

And now it looks the same.

2016-12-18_20-41-19

 

Excel: Data Transformations

Data Transformations

In statistics, data transformations are used, for lack of better words, to even out the numbers. While there are lots of reasons to perform transformations in the world of Statistics, in Data Science and Analytics, our main concern is to bring distributions to the normal form (the classic bell curve).

normal-curve

As with many concepts, sometimes it is easy to see by doing. So let’s jump into Excel, you can download the file to play along here: transform1

First things first – Install Analysis Toolkit

In order to created histograms in Excel (which we will need for this exercise), you need to install the Analysis Toolkit.

Click on the green “File” in the upper left corner of Excel, then click options.

transform

Click Add-Ins > select Analysis TookPak

transform1.jpg

Hit OK and now lets look at our data.

The Data

We have two columns –

UNITSF =10,000 records of house square footage

Bins = list of Bin sizes we will use for the histogram

transform3

First we will create a histogram to look at the data.

To do so, go to the Data tab in the Ribbon bar. Then on the far right select Data Analysis

tranform4.png

Select Histogram

transform5.png

For Input Range – select number values from the UNITSF column.

Bin Range – select number values from Bin column

transform6

Now you get a Frequency Distribution. Let’s visualize it to see better.

transform7.png

In the Ribbon Bar, select Insert>Recommended Charts

transform8

Select the Frequency bar chart

transform9.png

Look at this Frequency distribution. Notice how it bunched to the left. In Stats – they call this Right-Skewed since the long tail trails out to the right.

transform10.png

The problem is, many popular data mining (machine-learning) algorithms can be easily thrown off by data like this. The algorithms (like regressions, decision trees, boosted forests, naive bayes, etc.) are designed to work better with normal distributions.

So to normalize this distribution, we have to transform the data. What that means in practice is to apply a mathematical operation to all the data in the column. Operations like: logs, square roots, cube roots, squaring, cubing, etc.

In our example, the natural log does a very good job of transforming our data.

To do so, lets start by applying the natural log to the first element in our data.

=ln(A2)

transform11

Now hover over the little green box in the bottom corner until your cursor turns into a black plus sign. Double click and the natural log will be applied to all elements.

transform12

Build a new bin column ranging from 3 – 11 by .5

transform13.png

Now lets build our frequency table. Data> Data Analysis > Histogram

Make sure you select the B column for Input and E column for Bin Range

transform14.png

Now go to your new distribution and graph it like we did before.

Notice how the new distribution looks a lot more like the class bell curve.

tranform15.png

 

 

 

Excel: Adjusting Named Ranges

Okay, so you created a spreadsheet. To make it easier for your fellow coworkers to use, you implemented range names. And it all works great, that is until you need to change the actual range.

Look at this example. I had created range call Class which spanned across 16 cells representing 16 classes being offered. It worked great until my boss informs me that my school will now be offering 20 classes.

rangeAdj4.jpg

How do you change the range to include classes 17-20?

In the Formulas tab of the Ribbon, you will find an icon called Name Manager 

rangeAdj.jpg

Highlight the named range you are working with  – (in our case – Class)

Double click

rangeAdj1.jpg

Click on the little box to the right of the Refers to: box

rangeAdj2.jpg

Now highlight the new range

rangeAdj3

Click OK and OK. Now your range should contain your new cells.

 

 

 

 

Linear Optimization Model: Binary Constraints

Today we are going to build a Linear Optimization Model with binary constraints. What that means is that some of our decision variables are going to be restricted to 0 or 1 as possibilities.

This example is more complicated than earlier lessons, so I have included a fully filled out Excel file for you to follow along with. You can download the Excel file here: ModelBuild

Let’s start with the problem:

You have been hired by a company that builds airplane model kits. They currently produce 5 types of models – F-16’s, F-18’s, A-10’s, B-2’s, and B-52’s. They want you to help them maximize their profits.

They have provided you with the following production information.

The Minimum production may need a little explanation. Setting up the equipment to build a particular model has an expense built in. To cover that expense, you need to commit to building at least the value found in the minimum production row. If you can’t meet this level of production, we are not going to build any of this model.

binary1.jpg

We have also been given our available resources:

binary2.jpg

So now on to our decision variables. For each model, we have to first decide whether or not to produce. Remember, if we can’t produce our minimum amount for a particular model, we can’t produce any of that model.

So our Produce? decision variables are going to be either 1 (produce) or 0 (don’t produce)

binary3.jpg

Now our Minimum production row is a simple formula= Minimum production from row 7 above * your 1 or 0 from Produce?  I have placed a 1 in the first column of our Produce? row to demonstrate below.

binary4

Skip Units produced for now. Look at Maximum produced. This formula is simply the 1 or 0 from our Produce? row * 9999. I choose 9999 at random as I know we will never exceed this limit in our example. In other models, you may need a larger number.

Now Units produced is our second decision variable. This variable needs to sit in between Minimum production and Maximum production.

Note that if our Produce? variable is 0 (don’t build) Maximum production will be 0 (Produce? * 9999= 0) . Since Units produced needs to be less than or equal to Maximum production, we cannot produce any units.

Second, since if we decide to build (1), our Minimum production will come from our given values in blue. So our Units produced will need to be greater than or equal to the Minimum production.

 

Now lets set our Resource used using SUMPRODUCT()

Press Time

binary9.jpg

Grams Plastic

binary10

Finally, we need to set our Result cell – We do this using Sumproduct()

binary5.jpg

Now we can set up our solver.

  • Objective – Profit cell
  • Changing cells – our pink rows – note you separate the two rows with a comma
  • Set our Produce? row to binary (see second picture below)
  • Set Units produced to >= Minimum production
  • Set Units produced to <= Maximum production
  • Set Press Time Resource used <= Resource available
  • Set Grams Plastic Resource used <= Resource available

binary6

binary7.jpg

 

Finally make sure you are set to Simplex LP and hit Solve

binary8.jpg

 

Excel: Charts, Customizing Which Columns You Want to Chart

Here is the data set if you want to play along: ExcelCharts2

While Excel does a pretty good job of auto-formatting charts based on your data, sometimes a little customization work is in order. Take this data below for example:

What if I want to create a bar chart of the average readings?

chart1.jpg

Well first we need to make an “Average” column

chart2.jpg

chart3.jpg

I really don’t need all the decimal places. Hit the decimal reduce button in the picture below until you have a column full of integers.

chart4.jpg

Chart Method 1

Just like you would normally, select all the data and go to Insert > Column Chart

chart5.jpg

Right Click on your new chart and Select Data…

chart6.jpg

I don’t want my readings on the X axis, I want my Sensor Names, so click Switch Row/Column in the new window.

chart7.jpg

The X Axis is now A,B,C,D

chart8.jpg

Now, we only want to see the Average  column, so uncheck Reading 1,2,3 

chart9.jpg

Here is our chart

chart10

Chart Method 2

Don’t select any data yet. Just go to the Insert and Select Column Chart

chart11

Click Select Data now from the Ribbon (if you don’t see this, click anywhere on your new blank chart and this should appear in the Ribbon)

chart12

Click Add in the low left box.

chart13.jpg

In Series Name, either type a name or select the column header

chart14.jpg

In Series values: put put your data rows from the Average  column

chart15.jpg

Now we have data loaded into our chart, let’s change the data labels for our X Axis. In the Horizontal(Category) Axis Labels –  click Edit

chart16.jpg

Highlight A,B,C,D from the Sensor column

chart17

Now you see the numbers have been replaced by A,B,C,D

chart18

Here is our chart

chart19.jpg

 

 

Excel: Randomly Select Names from a List

One of the main elements of statistical analysis is making sure your data is a good representation of the population. The best way to do that is to make sure your sampled data is truly randomly selected.

I am going to show a quick method in Excel where we can select random names from a list (our population) of names. We will be using the following Excel functions to do so (if you are not familiar with them, don’t worry, I will walk you through it):

  • index()
  • randbetween()

We will also be creating a Named Range.

The List

Let’s start with the list. I have 14 names, and I want to randomly select 5 names from this list.

randName.jpg

Index()

Index is a method for using the looking up elements in Excel using the indexes (row numbers, column numbers). Note the index is always a number, so when dealing with columns, don’t be tricked by the column letters assigned.

Here I am looking for the 5th index in my list of names

=index(list,index)

randName2.jpg

My result is Steve

randName3.jpg

You can also work horizontally. Here I am looking for index 3 from my horizontal list.

randName4.jpg

Result is d

randName5.jpg

Now watch this. I am asking for the 4th index in this list. What do you think it will be? The 4th row is Tyrone.

randName6.jpg

Nope, when running INDEX() the index starts at 1 – starting at the first element in the selected list. If you start counting with Gary, Karen is our 4th person down.

randName7.jpg

RANDBETWEEN()

RANDBETWEEN accepts 2 arguments, high and low numbers in a range. It then returns a random integer from with the range.

RANDBETWEEN(1,20) will return an integer from 1 to 20 (including 1 and 20)

Named Range

Now before me move forward, when formulas start to get larger (as they are about to) I look for ways to simplify the syntax.  One trick would be to name our range.

Highlight all the names in your list, then go up to the name box and enter Names

randName1.jpg

Now combine RANDBETWEEN() and INDEX()

now try this: =INDEX(Names, RANDBETWEEN(1,14))

randName8.jpg

Hit enter and drag the formula down 5 cells

randName9.jpg

Now you have 5 random names selected from your list.

 

Excel Solver: Intro Optimization Models: Linear

What is an optimization model? An optimization model is a mathematical model designed to provide the optimal solution based on a information provided. You can use it answer questions like the optimal number of employees needed to working a shift or the optimal amount a product to produce that will bring in the most profit.

In this lesson we will be using Excel Solver ( a free ad-on) to develop an optimization model. If you do not have Solver loaded already, follow the instructions below:

Click File Button

adOpt1.jpg

Click Options

adOpt3

Click Add-Ins in the left column, find Solver Add-in. Click Go at the bottom next to Manage: Excel Add-ins

adOpt4.jpg

Check the box next to Solver Add-in – Click Ok and go back to main Excel page

adOpt5

To check that it installed, go to Data on the Ribbon and check for Solver in the Analysis box

adOpt6.jpg

Get the Excel File

You can make the file yourself, or download the template here: AdOptModel

Let’s look at our file:

This is a very simple example to start with. I am looking to run advertisements on a local radio station. The station provides two ad packages A and B. Package A is promised to reach 69,000 potential customers per airing and costs $156 dollars. Package B will reach 79,000 and cost $173.

I want to know how many of each package to run to reach the most customers. My only constraint is that I have a solid budget of  $10,000 that I cannot exceed.

adOpt2.jpg

Now you do not have to use the colors, I just learn this way and it makes sense to me. To understand the coloring:

  • Blue = input cells – you need to provide this information to your model
  • Pink = changing cells. These are the boxes Solver is going to use to try to work the problem out – do not put any formula in these boxes
  • Gray = Result cell – you can only have one result cell.

Build the model

Now we are going to build the model. We need to add some calculations to our existing sheet for the model to work.

Step 1: This step is optional. I always place some number value in my changing cells just to make sure my formulas are inputted  correctly.

adOpt7.jpg

Step 2: Constraint. Our budget is 10000, so we need a cell that calculates how much we are spending based on values in our change cells (Number of each ad we plan on running). To calculate this we are going to use a formula call Sumproduct()

What we want to do is multiply (# of ads for A * cost per Ad A) then add that to (# of ads for B * cost per Ad B)

That is what Sumproduct gives us (B5*B8)+(C5*C8)

adOpt8.jpg

We are going to do the same thing for our Result cell, except this time it will be Number of Ads * Customer Reach

adOpt9.jpg

Use Solver

Now it is time to use Solver, click on solver in the Data Tab of the Ribbon bar (under data analysis).

Let’s walk through this slow. In the top box, Set Objective, set this to the Result Cell. You can type it in or just click on the cell.

In the To: line, select Max since we are looking for the maximum customer reach here

adOpt10.jpg

adOpt11.jpg

Now set your Changing Variable Cells

adOpt12.jpg

To the right of big white box in the middle of the window, select Add

Place our constraint cells in this box.

adOpt13.jpg

Finally, make sure you check Make Unconstrained Variables Non-Negative and set our Solving Method to Simplex LP

Click Solve

adOpt14.jpg

Solver found a solution — Yay

Check – Keep Solver Solution

Hit okay

adOpt15.jpg

Here is our answer. Running 57.80347 Package B ads will get us 4566.474 Customers Reached.

adOpt16.jpg

But how do you run 0.8 of an ad?

Well, this is where we add in an integer constraint.

Reopen solver. Hit the Add button next to constraints

In Cell Reference, put our changing cells.

From the drop down pick “int”

adOpt17.jpg

adOpt18.jpg

Click Ok and run Solver again.

Now look at our new results: 2 of Package A and 56 of Package B

adOpt19.jpg

 

 

 

 

 

 

 

Excel: Range Names

In Excel, you can name your cells and ranges of cells with more user friendly names. This comes in handy as your spreadsheets become more complex. It also comes in handy when you are sharing spreadsheets between co-workers.

File download available here: checkbook

This file is simple enough. I have an example checkbook with monthly salary and monthly expenses listed.

rangeName1.jpg

Select cell containing the monthly salary(B3). Now go up to the cell name box (circled in red)

rangeName2

Type a new name in the box – I used “Income”

rangeName3.jpg

You can do the same thing with a range. Select cells D3- D5, then rename it “Expenses”

rangeName4

If you check the drop down from the name box, you will see your new named cells in the drop down.

rangeName5

Let’s uses our new named cells (and ranges) to make a quick formula.

rangeName6

rangeName7.jpg

rangeName8.jpg

How I Found Love Using Pivot Tables

Okay, a little background is in order here. I work for a Clinical Engineering Department in a large hospital system. Our main purpose is to inspect and repair medical equipment, from MRIs and CT Scanners down to IV pumps.

Now I know the title says love, and I promise there is a love connection, just be patient.

While doing some database work on the system we use to track repairs, I decided to do a little data exploration (I don’t have a lot of hobbies). I asked myself, “What equipment is breaking down the most?” I figured this could be a valuable piece of information. So, I exported six months worth of repair history into a CSV file.

Using Excel, I started playing with Pivot Tables. I started by checking to see what types of equipment seemed to break down the most. Turns out it was infusion pumps, not a real surprise to anyone who has ever worked in the field.

excelpivot2

But looking a little more closely. One hospital out of my system used Brand A and they wanted to get rid of them out of the belief they were the unreliable. However, a quick look at the data provided otherwise.

excelpivot3

Okay, so after I unsullied the reputation of the Brand A pump, I decided “Why not look at the repair rates of individual pieces of equipment?” (I know, I live a WILD life)

Below is the list from one of my hospitals. Pay special attention to the area highlighted in red. The amount of repair work orders opened for dental chairs was way off anything my 20 plus years of experience in the field would have led me to expect.

excelpivot1.jpg

So I decided to dig a little further. Well, it turns out all 78 work orders we opened by one of our younger (24 year old) single technicians. A quick walk up to the dental department quickly explained why the dental chairs needed so much attention. The problem was about 5’2″, long blond hair, a cute smile, and (even in scrubs) quite the little body.

So there you go. Young love revealed itself through the power of the pivot table.

Excel: Sorting and Filtering

Download Exercise File Here: ExcelTable

Sorting and Filtering are two basic functions you will perform quite often in Excel. When dealing with large data sets, it can be difficult to see what you are looking for, especially if the records appear in random order. Using sort and filter, you can adjust the spreadsheet to show what you want to see.

Sort

Sorting is simple in Excel. Just click on the header cell for the column you want to sort (in my example I chose “Name”). From the Ribbon Bar select Data. Now click on the AZ and ZA arrows. The column you selected will sort.

excelsort

Now, try it out on the other columns as well.

Finally, select the box directly above the word sort in the picture above. This will give you an advanced sort menu.

From this menu,  can select the column you wish to sort, the values you want to sort on, and the sort order.

excelsort1.jpg

Filter

To filter, select the Filter icon from the Data tab in the Ribbon Bar

excelsort2

Notice arrows now appear in the column header cells

excelsort3

Click on the arrow for Quarter. Now, uncheck the Select All box and check Quarter 2 > Okay

excelsort4.jpg

Now your sheet only has rows containing Quarter 2. Take note of the row numbers though. Notice the row numbers skip around. This is because the other rows are still there, they are just hidden by the filter.

excelsort5.jpg