R: Building Matrices

Working with matrices is very useful when working with data. A matrix gives you a two dimensional table similar to what you are used to in programs like Excel. In R, there are couple of different ways in which you can build a matrix.

Matrix()

The syntax for this command is matrix(data, number of rows, number of columns, fill by row, dimension names)

Lets start slow:

Lets build our data:

x <- 1:20

Now, lets build a 4×5 matrix

matrix.jpg

Notice how the numbers fill in by column, with column 1 being 1-4, column 2 then picks up with 5-8.

If you want to have your numbers fill in by Rows instead, we need to add a fourth argument. We are going to set Fill By Row to True.

matrix1.jpg

Now the final argument will be the dimension names. If you want something other than the numbers assigned by default, you need to provide dimension names in the form of a List

matrix2.jpg

Get Data From Matrix

First let us assign the matrix to  a variable. I am using capital “A”. Capital letters are commonly used to represent matrices. You will see this in most Linear Algebra books.

We call information out of the matrix by using the place in the table.

A[row, column]

  • A[1,3] – I call the number 3- which is located in the 1rst Row 3rd Column
  • A[1,] – returns the entire 1rst row
  • A[,1] -returns the entire 1rst column
  • A[“Row3″,”C”] – shows you can use the dimension names if you wish

matrix3.jpg

rbind()

rbind() stands for row bind. This is a method for binding multiple vectors into a matrix by row.

First we create 3 vectors (x,y,z) then we turn them in a matrix using rbind()

matrix4.jpg

cbind()

cbind() is just like rbind() but it lines yours vectors up by columns

matrix5

Change Dimension Names

If you want to change the dimension names after the matrix has already been created.

matrix6

Code

x <- 1:20

# build a 4 by 5 matrix
matrix<-(x,4,5)

#fill columns rows first
matrix<-(x,4,5,T)

#add dimension names
matrix(x,4,5,T,list(c("Row1","Row2","Row3","Row4"),c("A","B","C","D","E")))

#assign the matrix to a variable
A <- matrix(x,4,5,T,list(c("Row1","Row2","Row3","Row4"),c("A","B","C","D","E")))

#call matrix
A[1,3]

#call row
A[1,]

#call column
A[,1]
#call data point by row and column name
A[“Row3”,”C”]

#rbind()
x <- c(1,2,3,4)
y <- c("Hi","How","are","you")
z <- c(6,7,8,9)

rbind(x,y,z)

#cbind()
x <- c(1,2,3,4)
y <- c("Hi","How","are","you")
z <- c(6,7,8,9)

#change dimension names
A <- cbind(x,y,z)
colnames(A) = c("C1","C2","C3")
row.names(A) = c("R1","R2","R3","R4")
A

R: Installing Packages

Packages extend the capabilities of R. Packages contain libraries of code, functions, and data sets. While it is possible to create all your code from scratch, but why would you when someone has already done all the work for you.

Installing Packages in R is easy. For this lesson, we will install the HH package

-> install.packages(“HH”)

Or, you can use the GUI (graphical user interface)

From the top menu, Packages>Install package(s)…

rPackages

Select a CRAN mirror. I like to pick one close to me.

rPackages1

This lists all packages available on that mirror. I prefer this method as you can see a list of package and their proper spelling.

I chose HH.

rPackages2

Now just sit back and watch it install.

rPackages3

R: Vector operations

This is one area where R really shines. Consider two vectors a= (1,2,3,4) and b= (5,6,7,8). If you wanted to add the elements of the two vectors together like this:

1+5,2+6,3+7,4+8

In most programming languages, you would need to utilize a loop. And you could do it with a loop in R.

for (i in 1:4){
      print(a[i]+b[i])
}

Or, thanks to vector operations, you can just use the + sign

vectorFunc.jpg

You can also use -, *, /

vectorFunc1.jpg

You  can pass vector to a function and it will automatically iterate through it for you.

vectorFunc2.jpg

 

 

R: seq() and rep()

If you want to count out a sequence of number in R, you can simply use a :

1:15 goes from 1 to 15, while 6:22 goes from 6 to 22

seqRep.jpg

You can even assign these sequences to a variable, creating a vector

seqRep1.jpg

However, you can only count by 1 using this method.

seq()

With seq(1,15) I can count from 1 to 15, just like using 1:15

seqRep2.jpg

If I add a third argument though, now I am counting from 1 to 15 by 4

seqRep3.jpg

rep()

rep() stands for replicate. Using rep(), I can make a list of repeating elements

seqRep4

You can of course create vectors with both seq() and rep() by assigning them to a variable

seqRep5.jpg

Remember R indexes start at 1 not 0

so using the vector above, a[3] = 9 – a[1:3] = 1,5,9

 

 

 

R: Loops – For, While, Repeat

Loops are how we get computers to repeat tasks. In R, there are three standard loops: For, While, and Repeat

For

The For loop is a simple loop that iterates through a set of elements. With each iteration (running of the loop) the action found inside the loop is repeated.

In the loop I created below, 1:20 means 1 through 20. i in 1:20 mean count 1 through 20, assigning the current value of the count to i for the iteration of the loop.

rloop

Here are the results

rloop1.jpg

You can use a vector to iterate through. You can even use strings.

rloop2

Nested Loop

You can even nest your loops (running a loop inside another loop)

My main loop counts to 2. Each time it runs it prints its count and then it runs a second loop that steps through a vector containing 3 strings. It prints out each string in order, then returns to the top of the main loop and does it one more time.

rloop3.jpg

While loop

While loops work off of conditional logic. The general concept is while some condition is True, the loop will iterate. Once the condition is False, the loop terminates.

Below is state, while c is less than 10, iterate through the loop. Inside the loop I created a counter that adds 2 to the value of c each time. The end result is, I have created a listing of even numbers from 0 to 8

rloop4

Repeat Loop

Repeat loops repeat themselves (shocking!! I know!) until they are terminated using the break command.

Note the break command can be used in For and While loops if needed.

rloop5

The Code

# loop prints numbers 1 - 20
for (i in 1:20){
    print (i)
}

# loop prints elements in vector
for (i in c("Dog","Cat","Frog")){
     print(i)
}

#nested loop
for (n in 1:2){
   print(n)
   for (i in c("Dog","Cat","Frog")){
      print(i)
   } 
}

#while loop
c <- 0
while (c<10){
   print(c)
   c <- c+2
}

#repeat loop
a <-0
repeat{
   print(a)
   if (a==10){
     break}
   else {
    a<-a+2}
}

 

R: Create Functions in R

Learning to use functions in R will improve your programming skills greatly. The way I think of functions is I think of them as mini programs you create inside your program.

Below I create function called Times2. The syntax for creating a function is:

FunctionName <- function(parameters) {
              Action (body of the function)    
              }

rFunc.jpg

You call the function simply by calling it’s name and giving it a parameter.

rFunc1.jpg

You can even pass the function a vector

rFunc2.jpg

Nest a Function

You can nest functions – call a function from inside another function

rFunc3

Assign Function Output to a Variable

Below, I took the print statements out of the functions and assigned Times2Sqr(2) to a variable y

rFunc4.jpg

You can work with this variable just like any other variable

rFunc5.jpg

Recursive Function

Recursive Functions are functions that call themselves.

I am using If and Else in this example. If they are foreign to you, don’t worry, I will cover them in a future lesson.

rFunc6.jpg

The Code

# create a function in R

Times2 <- function(x) {
   y <- x*2
   print(y)
   }
 
Times2(2)
Times2(8)

x <- c(1,3,6,3)
Times2(x)

#nested functions
Times2Sqr <- function (x) {
   y <- Times2(x)**2
   print(y)
   }
 
Times2Sqr(2)

# assign function value to variable
Times2 <- function(x) {
   y <- x*2
   }
Times2Sqr <- function (x) {
   y <- Times2(x)**2
   }
 
y <- Times2Sqr(2)

#recursive functions
Recur <- function(x) {
 if (x==0)
 return(1)
 else 
 return (x * Recur(x-1))
 
}

Recur(6)

R: Intro to Statistics – Central Tendency

Central Tendency

One of the primary purposes of statistics is to find a way to summarize data. To do so, we often look for numbers known as collectively as measurements of central tendency (mean, median,  mode).

Look at the list below. This is a list of weekly gas expenditures for two vehicles. Can you tell me if one is better than the other, or are they both about the same?

rCentral

How about if I show you this?

rCentral1

Using the average, you can clearly see car2 is more cost efficient. At approx $21 a week, that is a savings of $630 over the course of the 30 weeks in the chart. That is a big difference, and one that is easy to see. We can see this using one of the 3 main measures of central tendency – the arithmetic mean – popularly called the average.

Mean

The mean – more accurately the arithmetic mean – is calculated by adding up the elements  in a list and dividing by the number of elements in the list.

rCentral2.jpg

In R, finding a mean is simple. Just put the values you want to average into a vector (**note to make a vector in R: var  <-c(x,y,z)) We then put the vector through the function mean()

rCentral3

Median

The median means, simply enough, the middle of an ordered list. This is also easy to find using R. As you can see, you do not even need to sort the list numerically first. Just feed the vector to the function median()

rCentral4.jpg

Mode

This is the last of 3 main measure. Mode returns the most common value found in a list. In the list 2,3,2,4,2 – the mode is 2. Unfortunately R does not have a built in mode function, so for this, we will have build our own function.

For those familiar with functions in programming, this shouldn’t be too foreign of a concept. However, if you don’t understand functions yet, don’t fret. We will get to them soon enough. For now, just read over the code and see if you can figure any of it out for yourself.

rCentral5

 

 

 

R: An Introduction

R is a programming language focused on statistics, data visualization, and data analysis. It is open source, which means there is a rich trove of libraries and add-ons constantly being developed by the open source community.

R is free to download and use. Follow the links below to download R if you would like to try it.

  • Windows
  • Linux Binaries
  • RStudio – Not required, but anyone looking for a better developed environment may want to check it out. I will be using just the base install of R for the following lessons though

Starting R

When R first starts up, this is what you will see. I am not going to focus too much on a grand tour, as most of the menus in R are pretty self explanatory. Instead, let’s jump right into the coding. Move your cursor to where my big red arrow is:

rIntro

Basic Syntax

There is an unwritten rule that states the first line of code you need to learn in any language is Hello World. Well, I am not going to do that. R is a stats program. Why don’t we start with some numbers instead.

rIntro1

As you can see, R uses your standard arithmetic operations (+,-,*,/,^)

Variables

Assigning variables in R is easy. “<-” is the designated syntax used to assign a variable. One great thing about R is that you do not need to declare variables in advance. R assigns the data type based on the input you give the variable.

rIntro2

Assigning Strings

rIntro3.jpg

Data Types

The main data types in R are:

  • Numeric: 1, 2.33, etc
  • Integer: 2L
  • Logical: TRUE, FALSE
  • Character: string
  • Complex: 2+3i (remember those from Trig class)

Vectors

Vectors allow you group multiple elements under one name. Use the syntax <-c() when creating a vector

rIntro4

Lists

Lists allow you to group unlike items – even vectors and strings:

rIntro5.jpg