R: Importing Excel Files

To work with Excel files in R, you can use the readxl library


Use the read_excel() function to read and Excel workbook

> library(readxl)
> ## read excel file
> df1 <- read_excel("r_excel.xlsx")
> head(df1)
# A tibble: 6 x 4
  Name   Job              Hours Complete
  <chr>  <chr>            <dbl> <chr>  
1 Sally  Predictive Model    1. n      
2 Philip Maintanence        10. n      
3 Tom    Ad-hoc Report      12. y      
4 Bob    SSRS                3. y      
5 Philip Tableau             7. n      
6 Tom    Tableau             9. n

By default, read_excel() reads only the first sheet in the Excel file. To read other sheets using the sheet key word.

> ## read  sheet 2
> df2 <- read_excel("r_excel.xlsx", sheet =2)
> head(df2)

# A tibble: 6 x 2
  Animal Num_Legs
  <chr>     <dbl>
1 Dog          4.
2 Duck         2.
3 Snake        0.
4 Horse        4.
5 Spider       8.
6 Human        2.

The next example, we are reading range B2 – C6 on sheet 1 (same as Excel’s range function)

> ## read sheet 1, range B2 - C6
> df3 <- read_excel("r_excel.xlsx", sheet =1, range = "B2:C6")
> df3
# A tibble: 4 x 2
  `Predictive Model`   `1`
  <chr>              <dbl>
1 Maintanence          10.
2 Ad-hoc Report        12.
3 SSRS                  3.
4 Tableau               7.

In this last example, we are importing only the first 4 rows on sheet 2

> ## read sheet 2, first 4 rows only
> df4 <- read_excel("r_excel.xlsx", sheet = 2, n_max = 4)
> df4
# A tibble: 4 x 2
  Animal Num_Legs
  <chr>     <dbl>
1 Dog          4.
2 Duck         2.
3 Snake        0.
4 Horse        4.

Excel: Copy, Cut, Paste, and Format Painter

This is a very basic introduction into Excel. I am going to start with the upper left hand corner of the Ribbon bar: the Clipboard region.


While I am aware most of you already know how to use these features, my website is for everyone, including the most base beginner. I remember the frustration of learning the fundamentals of analytics from websites that assumed I already had a PhD and 10 years of work experience in the area.

So feel free to skip this, or take a few minutes to read through, you might be surprised. There may be some tricks in this little corner of Excel you were not aware of.

Cut, Copy, and Past

These 3 features are ambiguous with computer use dating back to days DOS. As a matter of fact, the old keyboard shortcuts used back then still work.

Cut (Ctrl-X) – deletes the highlighted text and stores it in local memory (a clipboard)

Copy (Ctrl-C) – leaves highlighted text as is, but saves a copy of it into local memory

Paste(Ctrl-V) – pastes the contents of the clipboard into the spot you have chosen.

Below are two examples


Start by highlight the rows A:1 -A:4, click Cut (or Ctrl-X). Now Select Cell C1. Click Paste (or Ctrl-V). Notice column A is now empty.


Start by highlight the rows A:1 -A:4, click Copy (or Ctrl-C). Now Select Cell C1. Click Paste (or Ctrl-V). Notice Column C is now a Copy of Column A


If you click on the bottom right of the Clipboard box, the clipboard window opens up, showing your the current contents saved to the clipboard


So now if I add some letters to column B and copy it, that will end up in the Clipboard as well. Notice the original data is still in the clipboard


Now when you want to paste, you can choose which item in the clipboard to paste. Without using the clipboard, Excel will paste the most recent item added to the clipboard by default.


Copy as Picture

You might notice a drop down arrow next to Copy in the Ribbon Bar. If you click on it, you will see Copy as Picture as an option. This is great when working with Charts. This saves the data as a picture or bitmap so when you paste it elsewhere it will not be affected by changes to the source data.

What I mean by that is, looking at the chart below, column b has a value of 2. If I change that 2 to a 4 in the data table this chart was created from the bar representing b would change to 4 (so would any copied charts). But a chart copied “as picture” would not change. Imagine it to be like a screen shot.


Paste Special

Notice the options below Paste when you hit the drop down arrow. Some of the more popularly used are Transpose (turns a vertical list to horizontal list and vise versa). Another popular option is to paste “values”. This is useful when trying to copy a calculated value where all you want is the number (not the formula who made it).


Format Painter

Format Painter can help you repeat text and color formatting with just a few clicks

Start by adding color and font bolding to a set of cells. Highlight those cells and click Format Painter


You will now see your cursor is paint brush. Find a target you want to duplicate your formatting to and click on it.


And now it looks the same.



Excel: Data Transformations

Data Transformations

In statistics, data transformations are used, for lack of better words, to even out the numbers. While there are lots of reasons to perform transformations in the world of Statistics, in Data Science and Analytics, our main concern is to bring distributions to the normal form (the classic bell curve).


As with many concepts, sometimes it is easy to see by doing. So let’s jump into Excel, you can download the file to play along here: transform1

First things first – Install Analysis Toolkit

In order to created histograms in Excel (which we will need for this exercise), you need to install the Analysis Toolkit.

Click on the green “File” in the upper left corner of Excel, then click options.


Click Add-Ins > select Analysis TookPak


Hit OK and now lets look at our data.

The Data

We have two columns –

UNITSF =10,000 records of house square footage

Bins = list of Bin sizes we will use for the histogram


First we will create a histogram to look at the data.

To do so, go to the Data tab in the Ribbon bar. Then on the far right select Data Analysis


Select Histogram


For Input Range – select number values from the UNITSF column.

Bin Range – select number values from Bin column


Now you get a Frequency Distribution. Let’s visualize it to see better.


In the Ribbon Bar, select Insert>Recommended Charts


Select the Frequency bar chart


Look at this Frequency distribution. Notice how it bunched to the left. In Stats – they call this Right-Skewed since the long tail trails out to the right.


The problem is, many popular data mining (machine-learning) algorithms can be easily thrown off by data like this. The algorithms (like regressions, decision trees, boosted forests, naive bayes, etc.) are designed to work better with normal distributions.

So to normalize this distribution, we have to transform the data. What that means in practice is to apply a mathematical operation to all the data in the column. Operations like: logs, square roots, cube roots, squaring, cubing, etc.

In our example, the natural log does a very good job of transforming our data.

To do so, lets start by applying the natural log to the first element in our data.



Now hover over the little green box in the bottom corner until your cursor turns into a black plus sign. Double click and the natural log will be applied to all elements.


Build a new bin column ranging from 3 – 11 by .5


Now lets build our frequency table. Data> Data Analysis > Histogram

Make sure you select the B column for Input and E column for Bin Range


Now go to your new distribution and graph it like we did before.

Notice how the new distribution looks a lot more like the class bell curve.





Excel: Adjusting Named Ranges

Okay, so you created a spreadsheet. To make it easier for your fellow coworkers to use, you implemented range names. And it all works great, that is until you need to change the actual range.

Look at this example. I had created range call Class which spanned across 16 cells representing 16 classes being offered. It worked great until my boss informs me that my school will now be offering 20 classes.


How do you change the range to include classes 17-20?

In the Formulas tab of the Ribbon, you will find an icon called Name Manager 


Highlight the named range you are working with  – (in our case – Class)

Double click


Click on the little box to the right of the Refers to: box


Now highlight the new range


Click OK and OK. Now your range should contain your new cells.





Linear Optimization Model: Binary Constraints

Today we are going to build a Linear Optimization Model with binary constraints. What that means is that some of our decision variables are going to be restricted to 0 or 1 as possibilities.

This example is more complicated than earlier lessons, so I have included a fully filled out Excel file for you to follow along with. You can download the Excel file here: ModelBuild

Let’s start with the problem:

You have been hired by a company that builds airplane model kits. They currently produce 5 types of models – F-16’s, F-18’s, A-10’s, B-2’s, and B-52’s. They want you to help them maximize their profits.

They have provided you with the following production information.

The Minimum production may need a little explanation. Setting up the equipment to build a particular model has an expense built in. To cover that expense, you need to commit to building at least the value found in the minimum production row. If you can’t meet this level of production, we are not going to build any of this model.


We have also been given our available resources:


So now on to our decision variables. For each model, we have to first decide whether or not to produce. Remember, if we can’t produce our minimum amount for a particular model, we can’t produce any of that model.

So our Produce? decision variables are going to be either 1 (produce) or 0 (don’t produce)


Now our Minimum production row is a simple formula= Minimum production from row 7 above * your 1 or 0 from Produce?  I have placed a 1 in the first column of our Produce? row to demonstrate below.


Skip Units produced for now. Look at Maximum produced. This formula is simply the 1 or 0 from our Produce? row * 9999. I choose 9999 at random as I know we will never exceed this limit in our example. In other models, you may need a larger number.

Now Units produced is our second decision variable. This variable needs to sit in between Minimum production and Maximum production.

Note that if our Produce? variable is 0 (don’t build) Maximum production will be 0 (Produce? * 9999= 0) . Since Units produced needs to be less than or equal to Maximum production, we cannot produce any units.

Second, since if we decide to build (1), our Minimum production will come from our given values in blue. So our Units produced will need to be greater than or equal to the Minimum production.


Now lets set our Resource used using SUMPRODUCT()

Press Time


Grams Plastic


Finally, we need to set our Result cell – We do this using Sumproduct()


Now we can set up our solver.

  • Objective – Profit cell
  • Changing cells – our pink rows – note you separate the two rows with a comma
  • Set our Produce? row to binary (see second picture below)
  • Set Units produced to >= Minimum production
  • Set Units produced to <= Maximum production
  • Set Press Time Resource used <= Resource available
  • Set Grams Plastic Resource used <= Resource available




Finally make sure you are set to Simplex LP and hit Solve



Excel: Charts, Customizing Which Columns You Want to Chart

Here is the data set if you want to play along: ExcelCharts2

While Excel does a pretty good job of auto-formatting charts based on your data, sometimes a little customization work is in order. Take this data below for example:

What if I want to create a bar chart of the average readings?


Well first we need to make an “Average” column



I really don’t need all the decimal places. Hit the decimal reduce button in the picture below until you have a column full of integers.


Chart Method 1

Just like you would normally, select all the data and go to Insert > Column Chart


Right Click on your new chart and Select Data…


I don’t want my readings on the X axis, I want my Sensor Names, so click Switch Row/Column in the new window.


The X Axis is now A,B,C,D


Now, we only want to see the Average  column, so uncheck Reading 1,2,3 


Here is our chart


Chart Method 2

Don’t select any data yet. Just go to the Insert and Select Column Chart


Click Select Data now from the Ribbon (if you don’t see this, click anywhere on your new blank chart and this should appear in the Ribbon)


Click Add in the low left box.


In Series Name, either type a name or select the column header


In Series values: put put your data rows from the Average  column


Now we have data loaded into our chart, let’s change the data labels for our X Axis. In the Horizontal(Category) Axis Labels –  click Edit


Highlight A,B,C,D from the Sensor column


Now you see the numbers have been replaced by A,B,C,D


Here is our chart




Excel: Randomly Select Names from a List

One of the main elements of statistical analysis is making sure your data is a good representation of the population. The best way to do that is to make sure your sampled data is truly randomly selected.

I am going to show a quick method in Excel where we can select random names from a list (our population) of names. We will be using the following Excel functions to do so (if you are not familiar with them, don’t worry, I will walk you through it):

  • index()
  • randbetween()

We will also be creating a Named Range.

The List

Let’s start with the list. I have 14 names, and I want to randomly select 5 names from this list.



Index is a method for using the looking up elements in Excel using the indexes (row numbers, column numbers). Note the index is always a number, so when dealing with columns, don’t be tricked by the column letters assigned.

Here I am looking for the 5th index in my list of names



My result is Steve


You can also work horizontally. Here I am looking for index 3 from my horizontal list.


Result is d


Now watch this. I am asking for the 4th index in this list. What do you think it will be? The 4th row is Tyrone.


Nope, when running INDEX() the index starts at 1 – starting at the first element in the selected list. If you start counting with Gary, Karen is our 4th person down.



RANDBETWEEN accepts 2 arguments, high and low numbers in a range. It then returns a random integer from with the range.

RANDBETWEEN(1,20) will return an integer from 1 to 20 (including 1 and 20)

Named Range

Now before me move forward, when formulas start to get larger (as they are about to) I look for ways to simplify the syntax.  One trick would be to name our range.

Highlight all the names in your list, then go up to the name box and enter Names


Now combine RANDBETWEEN() and INDEX()

now try this: =INDEX(Names, RANDBETWEEN(1,14))


Hit enter and drag the formula down 5 cells


Now you have 5 random names selected from your list.


Excel Solver: Intro Optimization Models: Linear

What is an optimization model? An optimization model is a mathematical model designed to provide the optimal solution based on a information provided. You can use it answer questions like the optimal number of employees needed to working a shift or the optimal amount a product to produce that will bring in the most profit.

In this lesson we will be using Excel Solver ( a free ad-on) to develop an optimization model. If you do not have Solver loaded already, follow the instructions below:

Click File Button


Click Options


Click Add-Ins in the left column, find Solver Add-in. Click Go at the bottom next to Manage: Excel Add-ins


Check the box next to Solver Add-in – Click Ok and go back to main Excel page


To check that it installed, go to Data on the Ribbon and check for Solver in the Analysis box


Get the Excel File

You can make the file yourself, or download the template here: AdOptModel

Let’s look at our file:

This is a very simple example to start with. I am looking to run advertisements on a local radio station. The station provides two ad packages A and B. Package A is promised to reach 69,000 potential customers per airing and costs $156 dollars. Package B will reach 79,000 and cost $173.

I want to know how many of each package to run to reach the most customers. My only constraint is that I have a solid budget of  $10,000 that I cannot exceed.


Now you do not have to use the colors, I just learn this way and it makes sense to me. To understand the coloring:

  • Blue = input cells – you need to provide this information to your model
  • Pink = changing cells. These are the boxes Solver is going to use to try to work the problem out – do not put any formula in these boxes
  • Gray = Result cell – you can only have one result cell.

Build the model

Now we are going to build the model. We need to add some calculations to our existing sheet for the model to work.

Step 1: This step is optional. I always place some number value in my changing cells just to make sure my formulas are inputted  correctly.


Step 2: Constraint. Our budget is 10000, so we need a cell that calculates how much we are spending based on values in our change cells (Number of each ad we plan on running). To calculate this we are going to use a formula call Sumproduct()

What we want to do is multiply (# of ads for A * cost per Ad A) then add that to (# of ads for B * cost per Ad B)

That is what Sumproduct gives us (B5*B8)+(C5*C8)


We are going to do the same thing for our Result cell, except this time it will be Number of Ads * Customer Reach


Use Solver

Now it is time to use Solver, click on solver in the Data Tab of the Ribbon bar (under data analysis).

Let’s walk through this slow. In the top box, Set Objective, set this to the Result Cell. You can type it in or just click on the cell.

In the To: line, select Max since we are looking for the maximum customer reach here



Now set your Changing Variable Cells


To the right of big white box in the middle of the window, select Add

Place our constraint cells in this box.


Finally, make sure you check Make Unconstrained Variables Non-Negative and set our Solving Method to Simplex LP

Click Solve


Solver found a solution — Yay

Check – Keep Solver Solution

Hit okay


Here is our answer. Running 57.80347 Package B ads will get us 4566.474 Customers Reached.


But how do you run 0.8 of an ad?

Well, this is where we add in an integer constraint.

Reopen solver. Hit the Add button next to constraints

In Cell Reference, put our changing cells.

From the drop down pick “int”



Click Ok and run Solver again.

Now look at our new results: 2 of Package A and 56 of Package B









Excel: Range Names

In Excel, you can name your cells and ranges of cells with more user friendly names. This comes in handy as your spreadsheets become more complex. It also comes in handy when you are sharing spreadsheets between co-workers.

File download available here: checkbook

This file is simple enough. I have an example checkbook with monthly salary and monthly expenses listed.


Select cell containing the monthly salary(B3). Now go up to the cell name box (circled in red)


Type a new name in the box – I used “Income”


You can do the same thing with a range. Select cells D3- D5, then rename it “Expenses”


If you check the drop down from the name box, you will see your new named cells in the drop down.


Let’s uses our new named cells (and ranges) to make a quick formula.




To see a video of this lesson, click the link here: Excel: Name Ranges

How I Found Love Using Pivot Tables

Okay, a little background is in order here. I work for a Clinical Engineering Department in a large hospital system. Our main purpose is to inspect and repair medical equipment, from MRIs and CT Scanners down to IV pumps.

Now I know the title says love, and I promise there is a love connection, just be patient.

While doing some database work on the system we use to track repairs, I decided to do a little data exploration (I don’t have a lot of hobbies). I asked myself, “What equipment is breaking down the most?” I figured this could be a valuable piece of information. So, I exported six months worth of repair history into a CSV file.

Using Excel, I started playing with Pivot Tables. I started by checking to see what types of equipment seemed to break down the most. Turns out it was infusion pumps, not a real surprise to anyone who has ever worked in the field.


But looking a little more closely. One hospital out of my system used Brand A and they wanted to get rid of them out of the belief they were the unreliable. However, a quick look at the data provided otherwise.


Okay, so after I unsullied the reputation of the Brand A pump, I decided “Why not look at the repair rates of individual pieces of equipment?” (I know, I live a WILD life)

Below is the list from one of my hospitals. Pay special attention to the area highlighted in red. The amount of repair work orders opened for dental chairs was way off anything my 20 plus years of experience in the field would have led me to expect.


So I decided to dig a little further. Well, it turns out all 78 work orders we opened by one of our younger (24 year old) single technicians. A quick walk up to the dental department quickly explained why the dental chairs needed so much attention. The problem was about 5’2″, long blond hair, a cute smile, and (even in scrubs) quite the little body.

So there you go. Young love revealed itself through the power of the pivot table.