R: Data Types Tutorial

In my introduction to R, I covered Lists and Vectors briefly, but I will offer a quick review of them.


Lists in R are collections of data. These data can be of any type. You can have numbers, letters, and booleans all in a single list if you so choose.

The most important things to remember about lists is that they store any data type and you create them using the syntax: list(data,data,…,data)



Vectors are a collection of like data types

Here I created 2 vectors – one numeric and one character

syntax: c(data,data,…data1)


One thing to keep in mind with R is that unlike most programming languages – the indexes in R start at 1.

(1,3) — returns 1rst and 3rd element in vector

(1:3) — returns 1rst through 3rd element



A matrix is a multiple dimension vector. All columns must be same length and all elements must be same type.

You shape you matrix by providing row and column size(nrow = and ncol =). You can also name the columns and rows using dimnames=

syntax: matrix (vector, nrow=,ncol=, dimnames=)



Arrays allow you to build multi-dimensional arrays. In the example below, I build a 2 dimensional 3×4 array



A factor takes a vector and aggregates like items.

As you can see when I print the factor out, it shows all elements in the vector and provides a list of unique elements

nlevels() returns the number of unique elements.


summary() returns a list of the unique elements and a count of items in those elements.



Dataframes allow you to make a tabular table. Also, unlike matrices, the columns in a dataframe can contain different data types




Python: Lists and Dictionaries

Lists and Dictionaries are kind of like super variables. They allow you to store multiple data under a singular name. In the example below, I first assign the variable ‘a‘ a value of 6. Next when I go to assign it a value of 7, it replaces 6.

What if I want to keep both 6 and 7?


Sure I could use 2 variables, (ex. a and b), or I can use a list.


A list is a collection of data in Python. You create a list by assigning a variable a group of data enclosed in square brackets [].

Notice that lists are indexed, meaning you call them up using the following syntax: variable[index]

Note that indexes in Python start at 0. So in our set of 4 elements, the indexes would be 0,1,2,3. Notice when I try to use index 4, I get an out of range error.


Lists can also hold strings, or combinations of values in one list.



This of dictionaries as lists with named indexes. This makes looking up values easier (especially when your code get more complicated).

Each data element is paired with a named index. A dictionary is denoted through the use of curly braces {} and : separating the named index and the value.


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python

How I Found Love Using Pivot Tables

Okay, a little background is in order here. I work for a Clinical Engineering Department in a large hospital system. Our main purpose is to inspect and repair medical equipment, from MRIs and CT Scanners down to IV pumps.

Now I know the title says love, and I promise there is a love connection, just be patient.

While doing some database work on the system we use to track repairs, I decided to do a little data exploration (I don’t have a lot of hobbies). I asked myself, “What equipment is breaking down the most?” I figured this could be a valuable piece of information. So, I exported six months worth of repair history into a CSV file.

Using Excel, I started playing with Pivot Tables. I started by checking to see what types of equipment seemed to break down the most. Turns out it was infusion pumps, not a real surprise to anyone who has ever worked in the field.


But looking a little more closely. One hospital out of my system used Brand A and they wanted to get rid of them out of the belief they were the unreliable. However, a quick look at the data provided otherwise.


Okay, so after I unsullied the reputation of the Brand A pump, I decided “Why not look at the repair rates of individual pieces of equipment?” (I know, I live a WILD life)

Below is the list from one of my hospitals. Pay special attention to the area highlighted in red. The amount of repair work orders opened for dental chairs was way off anything my 20 plus years of experience in the field would have led me to expect.


So I decided to dig a little further. Well, it turns out all 78 work orders we opened by one of our younger (24 year old) single technicians. A quick walk up to the dental department quickly explained why the dental chairs needed so much attention. The problem was about 5’2″, long blond hair, a cute smile, and (even in scrubs) quite the little body.

So there you go. Young love revealed itself through the power of the pivot table.

Python: Fun with Central Tendency

Now numpy provides easy functions for finding central tendency – you can find them in my Numpy Part II lesson: Python: Numpy Part II.

But we have learned enough about Python so far, that maybe it would be more fun to build our own functions. In this lesson we are going to build our own statistics library with mean, median, mode, and quantile

Our Data




or even easier:



Remember with median – if your data contains an odd number of elements (n%2==1), you just take the middle value. However, if your data contains and even number of elements (n%2==0) then you add the two middle numbers and divide by 2.

We handle that through the use of an if statement in our function.



For mode, we want to find the most common element in the list. For this task, I will import Counter from collections.

d.most_common() returns each unique element and the number of times it appears

d.most_common(1) returns the


or in function form:


In my numpy part II lesson I use a more elegant solution, but I want you to see that there is always more than one way to do something in Python.


Quantiles are cut points in set of data. They can represent the bottom ten percent of the data or the top 75% or any % from 0 to 100.

In my solution, I am adding a slicing argument to the sorted() function. Below shows all elements up to (but not including) index 4


quantile function:


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python

SQL: Intro to Joins

More often than not, the information you need does not all reside in one table. To query information from more than one table, use the Join command.

In our example, we are going to be using AdventureWorks2012. If you want to follow along, but do not have SQL Server or AdventureWorks2012 installed, click in the following links:

We are going to be using the following Tables for our Join

Notice they both have a matching field – BusinessEntityID – This field will be important when creating a join.


We can call up both tables individually through separate SQL Queries


However, to combine the two tables into a single result, you will need a Join.

Inner Join

SQL uses Inner Joins by default. In this lesson, I am only going to focus on inner joins. Inner joins work by focusing on columns with matching information. In the example below, the columns with matching information are the Name columns.

The combined result only contains rows who have a match in both source tables. Notice that Sally and Sarah do not appear in the result table.


The syntax is as follows:

Select *
from Table A join Table B
on TableA.Match = TableB.Match


In our example, the matching column is BusinessEntityID in both tables.

To clean the code up a little bit, we can assign aliases to our tables using the “as” keyword. We are using E and S as our aliases. We can now use those aliases in our “on” clause.


And, just as in a regular select statement, we can choose which columns we want. You do however, have to identify which table the columns you are requesting are from. Notice how I did this using the E and S aliases.



R: Intro to Statistics – Central Tendency

Central Tendency

One of the primary purposes of statistics is to find a way to summarize data. To do so, we often look for numbers known as collectively as measurements of central tendency (mean, median,  mode).

Look at the list below. This is a list of weekly gas expenditures for two vehicles. Can you tell me if one is better than the other, or are they both about the same?


How about if I show you this?


Using the average, you can clearly see car2 is more cost efficient. At approx $21 a week, that is a savings of $630 over the course of the 30 weeks in the chart. That is a big difference, and one that is easy to see. We can see this using one of the 3 main measures of central tendency – the arithmetic mean – popularly called the average.


The mean – more accurately the arithmetic mean – is calculated by adding up the elements  in a list and dividing by the number of elements in the list.


In R, finding a mean is simple. Just put the values you want to average into a vector (**note to make a vector in R: var  <-c(x,y,z)) We then put the vector through the function mean()



The median means, simply enough, the middle of an ordered list. This is also easy to find using R. As you can see, you do not even need to sort the list numerically first. Just feed the vector to the function median()



This is the last of 3 main measure. Mode returns the most common value found in a list. In the list 2,3,2,4,2 – the mode is 2. Unfortunately R does not have a built in mode function, so for this, we will have build our own function.

For those familiar with functions in programming, this shouldn’t be too foreign of a concept. However, if you don’t understand functions yet, don’t fret. We will get to them soon enough. For now, just read over the code and see if you can figure any of it out for yourself.