R Shiny: Introduction

Shiny is an R Studio application that allows you to how host your R code on a website. This is a great method for creating a dashboard or deploying a predictive model. The best part is, it is free. There are, however, paid services associated with R Studio and Shiny if you want to get professional support and such.

In this lesson, we are going to learn to build a Shiny server application step by step.

Let’s start at the beginning. First off, to use Shiny, you need to have R Studio installed on your computer. Once you have that, you will need to install the Shiny package

install.packages(‘shiny’)

Next, we are going to learn about the three parts of a Shiny server application.

Below is the basic framework of a Shiny app:

library(shiny)

ui <- fluidPage("Test")
server <- function(input, output) {}
shinyApp(ui = ui, server = server)

Let’s break this down:

ui: ui is the user interface. This is the code that makes up the web page the user will launch and interact with. Anyone familiar with HTML, this is the basic markup. This is where you set the wording, font, colors, what kind of object will be on the page, etc

server: server is the brains of the operation. This is where you load and manipulate any data being used in your page. Again, if you are a web developer, think Java Script here.

shinyApp: this is the code that launches your App.

If you run the code above, you will see something like this

2018-03-19_11-06-29

Note the IP address in the upper left. If you copy this into a browser of your choice, it will run there too.

2018-03-19_11-07-28.png

You will see the same address in the R Console. As long as the R Shiny server is running, you will see this Listening command. For those not familiar, the address 127.0.0.1 is called a pingback address. This is the internal address of the NIC card on your machine. This means the server is running on the machine you are on. The :4654 is the port my computer grabbed. This can be different for you. The easiest way for me to explain the port without going too deep into network theory is: think of it like a TV channel. From your computer, you could technically be running hundreds of these R Shiny servers. The port (TV channel) is what tells the computer which one of these web pages you want to visit.

If you happen to be on a network, you can go to another computer and type in your host computers network IP address (you can get it by going to a cmd console and typing ipconfig). It will be the address I have the arrows pointing to in my example

2018-03-19_13-33-33.png

Then, using my example, type http://10.171.46.91:4654 into the second computer’s browser and you should see your Shiny website.

Step 2

Let’s see if we can make our website more interesting, let’s start easy, we will format the text.

First close out your shiny window to end the session.

Change your ui line to match the one below.

ui <- fluidPage(h1(“Test”))

Now run your code again

library(shiny)
ui <- fluidPage(h1(“Test”))
server <- function(input, output) {}
shinyApp(ui = ui, server = server)

2018-03-19_13-38-26

Step 3

Okay, let’s make it a little more interesting by adding some detail to our page. We will do so by using panel methods in our fluidPage()

library(shiny)

ui <- fluidPage(
  titlePanel("Title Panel"),
     sidebarPanel("Sidebar Panel"),
      mainPanel("Main Panel")
    )
server <- function(input, output) {}
shinyApp(ui = ui, server = server)

Notice your page is now segmented into a title bar, a sidebar and a main panel

2018-03-19_13-52-55.png

Okay, now let’s actually add some R data magic to this.

library(shiny)
library(datasets)
mpgData <- mtcars

ui <- fluidPage(
  titlePanel("Mpg vs Cylinder Plot"),
     sidebarPanel("Sidebar Panel"),
      mainPanel(  plotOutput("mpgPlot"))
    )
server <- function(input, output) {
  output$mpgPlot <- renderPlot({
    boxplot(mpg~cyl, data= mpgData)})
}
shinyApp(ui = ui, server = server)

Let’s break down the new code. First thing first, we are loading the built in datasets library and grabbing the mtcars data set from it.

library(datasets)

mpgData <- mtcars

Next we are going to skip down to the server section next as this is where we work with our data:

First thing you will notice is function(input, output) – this is a standard call that allows you to pass information back and forth through the server code. In our example, we are only using the output right now.

We create a variable called output$mpgPlot to which we pass the shiny method renderPlot.  Inside that method, we are simply creating a simple boxplot comparing mpg to cyl using the mtcars data set that we labeled mpgData

server <- function(input, output) {
  output$mpgPlot <- renderPlot({
    boxplot(mpg~cyl, data= mpgData)}) 
}

The boxplot we rendered as output$mpgPlot is now passed to our ui

Next, in the ui I renamed the title panel and in the mainPanel, I added the command plotOutput(“mpgData”). That will allow me to post the boxplot I created in the server to the ui web page.

ui <- fluidPage(
  titlePanel("Mpg vs Cylinder Plot"),
     sidebarPanel("Sidebar Panel"),
      mainPanel(  plotOutput("mpgPlot"))
    )

If you run the code, it should look like this:

2018-03-19_14-59-52

Step 4

Now let’s add a little interactivity to our web page. We are going to give the user the option to view the boxplots with or without the outliers taken into effect.

library(shiny)
library(datasets)
mpgData <- mtcars

ui <- fluidPage(
  titlePanel("Mpg vs Cylinder Plot"),
     sidebarPanel(checkboxInput("outliers", "Show outliers", FALSE)),
      mainPanel( 
                      plotOutput("mpgPlot"))
    )
server <- function(input, output) {
  output$mpgPlot <- renderPlot({
    boxplot(mpg~cyl, data= mpgData,
            outline = input$outliers)})
}
shinyApp(ui = ui, server = server)

This involves only two changes to our code

First we will add a checkbox to our sidebar, in the ui we will alter the sidebarPanel line to look like this

sidebarPanel(checkboxInput(“outliers”, “Show outliers”, FALSE)),

The breakdown of the syntax here is

checkboxinput(variable name, display text, default )– our default is set to FALSE so our box will be unchecked by default

 

Next, we will alter our boxplot to take into account the checkbox input

boxplot(mpg~cyl, data= mpgData,
            outline = input$outliers)

— notice the variable name is input$outliers  It is a composite of the variable from our checkboxinput method and “input” from out function(input, output) function that opens up our server.

Now run your code

Unchecked

2018-03-19_15-05-46.png

Checked

2018-03-19_15-05-29.png

R: Creating a Word Cloud

Word Clouds are great visualization techniques for dealing with text analytics. The idea behind them is they display the most common words in a corpus of text. The more often a word is used, the larger and darker it is.

2016-12-16_21-27-13.jpg

Making a word cloud in R is relatively easy. The tm and wordcloud libraries from R’s CRAN repository is used to create one.

library(tm)
library(wordcloud)

If you do not have either of these loaded on your machine, you will have to use the following commands

install.packages("tm")
install.packages("wordcloud")

Now in order to make a word cloud, you first need a collection of words. In our example I am going to use a text file I created from the Wikipedia page on R.

You can download the text file here: rwiki

Now let’s load the data file.

text <- readLines("rWiki.txt")
> head(text)
[1] "R is a programming language and software environment 
[2] "The R language is widely used among statisticians and 
[3] "Polls, surveys of data miners, and studies of scholarly 
[4] "R is a GNU package.[9] The source code for the R 
[5] "General Public License, and pre-compiled binary versions
[6] "R is an implementation of the S programming language "
>

Notice each line in the text file is an individual element in the vector –  text

Now we need to move the text into a tm element called a Corpus. First we need to convert the vector text into a VectorSource.

wc <- VectorSource(text)
wc <- Corpus(wc)

Now we need to pre-process the data. Let’s start by removing punctuation from the corpus.

wc <- tm_map(wc, removePunctuation)

Next we need to set all the letters to lower case. This is because R differentiates upper and lower case letters. So “Program” and “program” would treated as 2 different words. To change that, we set everything to lowercase.

wc <- tm_map(wc, content_transformer(tolower))

Next we will remove stopwords. Stopwords are commonly used words that provide no value to the evaluation of the text. Examples of stopwords are: the, a, an, and, if, or, not, with ….

wc <- tm_map(wc, removeWords, stopwords("english"))

Finally, let’s strip away the whitespace

wc <- tm_map(wc, stripWhitespace)

Now let us make our first word cloud

The syntax is as follows – wordcloud( words = corpus, scale = physical size, max.word = number of words in cloud)

wordcloud(words = wc, scale=c(4,0.5), max.words=50)

2016-12-16_22-37-12.jpg

Now we have a word cloud, let’s add some more elements to it.

random.order = False brings the most popular words to the center

wordcloud(words = wc, scale=c(4,0.5), max.words=50,random.order=FALSE)

2016-12-16_22-42-35.jpg

To add a little more rotation to your word cloud use rot.per

wordcloud(words = wc, scale=c(4,0.5), max.words=50,random.order=FALSE,
 rot.per=0.25)

Finally, lets add some color. We are going to use brewer.pal.  The syntax is brewer.pal(number of colors, color mix)

cp <- brewer.pal(7,"YlOrRd")
wordcloud(words = wc, scale=c(4,0.5), max.words=50,random.order=FALSE,
 rot.per=0.25, colors=cp)

2016-12-16_22-48-06

 

 

 

R: K-Means Clustering- Deciding how many clusters

In a previous lesson I showed you how to do a K-means cluster in R. You can visit that lesson here: R: K-Means Clustering.

Now in that lesson I choose 3 clusters. I did that because I was the one who made up the data, so I knew 3 clusters would work well. In the real world it doesn’t work that way. Choosing the right number of clusters is one of the trickier parts of performing a k-means cluster.

If you go over to Michael Grogan’s site, you will see he has a great method for figuring out how many clusters to choose. http://www.michaeljgrogan.com/k-means-clustering-example-stock-returns-dividends/

wss <- (nrow(sample_stocks)-1)*sum(apply(sample_stocks,2,var))
for (i in 2:20) wss[i] <- sum(kmeans(sample_stocks,
centers=i)$withinss)
plot(1:20, wss, type="b", xlab="Number of Clusters",
ylab="Within groups sum of squares")

If you understand the code above, then great. That is a great solution for choosing the number of clusters. If, however, you are not 100% sure what is going on above, keep reading. I’ll walk you through it.

K-Means Clustering

We need to start by getting a better understanding of what k-means clustering means. Consider this simplified explanation of clustering.

The way is works is each of the rows our data are placed into a vector.

2016-12-06_22-09-28

These vectors are then plotted out in space. Centroids (the yellow stars in the picture below) are chosen at random. The plotted vectors are then placed into clusters based on which centroid they are closest to.

kmeans

So how do you measure how good your clusters fit. (Do you need more clusters? Less clusters)? One popular metrics is the Within cluster sum of squares. R provides this as kmeans$withinss. What this means is the distance the vectors in each cluster are from their respected centroid.

The goal is to get the is to get this number as small as possible. One approach to handling this is to run your kmeans clustering multiple times, raising the number of the clusters each time. Then you compare the withinss each time, stopping when the rate of improvement drops off. The goal is to find a low withinss while still keeping the number of clusters low.

2016-12-10_22-02-49

This is, in effect, what Michael Grogan has done above.

Break down the code

Okay, now lets break down Mr. Grogan’s code and see what he is doing.

wss <- (nrow(sample_stocks)-1)*sum(apply(sample_stocks,2,var))
for (i in 2:20) wss[i] <- sum(kmeans(sample_stocks,
centers=i)$withinss)
plot(1:20, wss, type="b", xlab="Number of Clusters",
ylab="Within groups sum of squares")

 

The first line of code is a little tricky. Let’s break it down.

wss <- (nrow(sample_stocks)-1)*sum(apply(sample_stocks,2,var))

sample_stocks – the data set

wss <-  – This simply assigns a value to a variable called wss

(nrow(sample_stocks)-1)  – the number of rows (nrow) in sample_stocks – 1. So if there are 100 rows in the data set, then this will return 99

sum(apply(sample_stocks,2,var)) – let’s break this down deeper and focus on the apply() function. apply() is kind of like a list comprehension in Python. Here is how the syntax works.

apply(data, (1=rows, 2=columns), function you are passing the data through)

So, let’s create a small array and play with this function. It makes more sense when you see it in action.

 tt <- array(1:20, dim=c(10,2)) # create array with data 1 -20, 
                                #10 rows, 2 columns
> tt
 [,1] [,2]
 [1,] 1 11
 [2,] 2 12
 [3,] 3 13
 [4,] 4 14
 [5,] 5 15
 [6,] 6 16
 [7,] 7 17
 [8,] 8 18
 [9,] 9 19
[10,] 10 20

Now lets try running this through apply.

> apply(tt, 2, mean)
[1] 5.5 15.5

Apply took the mean of each column. Had I used 1 as the second argument, it would have taken the mean of each row.

> apply(tt, 1, mean)
 [1] 6 7 8 9 10 11 12 13 14 15

Also, keep in mine, you can create your own functions to be used in apply

apply(tt,2, function(x) x+5)
 [,1] [,2]
 [1,] 6 16
 [2,] 7 17
 [3,] 8 18
 [4,] 9 19
 [5,] 10 20
 [6,] 11 21
 [7,] 12 22
 [8,] 13 23
 [9,] 14 24
[10,] 15 25

So, what is Mr. Grogan’s doing with his apply function? apply(sample_stocks,2,var) – He is taking the variance of each column his data set.

 apply(tt,2,var)
[1] 9.166667 9.166667

And by summing it: sum(apply(sample_stocks,2,var)) – he is simply adding the two values together.

 sum(apply(tt,2,var))
[1] 18.33333

So, the entire first line of code using our data is:

wss <- (nrow(tt)-1)*sum(apply(tt,2,var))

wss <- (10-1) * (18.333)

wss <- (nrow(tt)-1)*sum(apply(tt,2,var))
> wss
[1] 165

What this number is effectively is the within sum of squares for a data set that has only one cluster

Next section of code

Next we will tackle the next two lines of code.

for (i in 2:20) wss[i] <- sum(kmeans(sample_stocks,
centers=i)$withinss)

The first part is a for loop and should be simple enough. Note he doesn’t use {} to denote the inside of his loop. You can do this when your for loop is a single line, but I am going to use the {}’s anyway, as I think it makes the code a bit neater.

for (i in 2:20)  — a for loop iterating from 2 -20

for (i in 2:20) {

wss[i] <- }  – we are going to assign more values to the vector wss starting at 2 and working our way down to 20.

Remember, a single value variable in R is actually a single value vector.

c <- 5
> c
[1] 5
> c[2] <- 7
> c
[1] 5 7

Okay, so now to the trickier code. sum(kmeans(sample_stocks, centers = i)$withinss)

What he is doing is running a kmeans cluster for the data one time each for each value of centers (number of centroids we want) from 2 to 20 and reading the $withinss from each run. Finally it sums all the withinss up (you will have 1 withinss for every cluster you create – number of centers)

Plot the results

The last part of the code is plotting the results

plot(1:20, wss, type="b", xlab="Number of Clusters",
ylab="Within groups sum of squares")

plot (x, y, type= type of graph, xlab = label for x axis, ylab= label for y axis

Let’s try it with our data

If you already did my Kmeans lesson, you should already have the file, if not you can download it hear. cluster

 myData <- read.csv('cluster.csv')
> head(myData)
 StudentId TestA TestB
1 2355645.1 134 24
2 8718152.6 155 32
3 8303333.6 130 25
4 6352972.5 185 86
5 3381543.2 153 95
6 817332.4 153 81
> myData <- myData [,2:3] # get rid of StudentId column
> head(myData)
 TestA TestB
1 134 24
2 155 32
3 130 25
4 185 86
5 153 95
6 153 81

Now lets feed this through Mr. Grogan’s code

wss <- (nrow(myData)-1)*sum(apply(myData,2,var))
for (i in 2:20) {
          wss[i] <- sum(kmeans(myData,
          centers=i)$withinss)
         }
plot(1:20, wss, type="b", xlab="Number of Clusters",
ylab="Within groups sum of squares")

Here is our output ( a scree plot for you math junkies out there)

2016-12-11_14-05-39.jpg

Now Mr. Grogan’s plot a nice dramatic drop off, which is unfortunately not how most real world data I have seen works. I am going to chose 5 as my cut off point, because while the withinss does continue to decrease, it doesn’t seem to do so at a rate great enough to accept the added complexity of more clusters.

If you want to see how 5 clusters looks next to the three I had originally created, you can run the following code.

myCluster <- kmeans(myData,5, nstart = 20)
myData$cluster <- as.factor(myCluster$cluster)
ggplot(myData, aes(TestA, TestB, color = cluster))
+ geom_point()

5 Clusters

2016-12-11_14-13-02

3 Clusters

graph

I see some improvement in the 5 cluster model. So Michael Grogan’s trick for finding the number of clusters works.

 

R: Converting Factors to Numbers

R, like all programming languages, has its quirks. One of the more frustrating ones is the way it acts when trying to convert a factor into a numeric variable.

Let’s start with a vector of numbers that have been mistakenly loaded as characters.

chars <- c("12","13","14","12","11","13","12")
chars
typeof(chars)

Here is the output

2016-12-03_15-49-23.jpg

Now, let’s convert this vector to a numeric vector using the function as.numeric()

nums <- as.numeric(chars)
nums
typeof(nums)

And here is the output

2016-12-03_15-55-40.jpg

As you can see it works fine.

But now let’s try it with a factor

fac <- factor(c("12","13","14","12","11","13","12"))
fac
typeof(fac)

Here is the output.

2016-12-03_15-58-06.jpg

Now look what happens when I try the as.numeric() function

nums <- as.numeric(fac)
nums
typeof(nums)

Check out the results

2016-12-03_16-00-11

While is says the type is a double, clearly the numbers are not correct.

How to fix it?

Well, the secret is that first you need to convert the factor into a character, then into a numeric.

nums <- as.numeric(as.character(fac))
nums
typeof(nums)

Now check out the results

2016-12-03_16-02-30.jpg

Now we have the correct numbers. Just keep this little trick in mind. It has caused me some undue frustration in past.

 

R: Simple Linear Regression

Linear Regression is a very popular prediction method and most likely the first predictive algorithm most be people learn. To put it simply, in linear regression you try to place a line of best fit through a data set and then use that line to predict new data points.

linear1

If you are new to linear regression or are in need of a refresher, check out my lesson on Linear Regression in Excel where I go much deeper into the mechanics: Linear Regression using Excel

Get the Data

You can download our data set for this lesson here: linear1

Let’s upload our file into R

df <- read.csv(file.choose())
head(df)

linReg1

Now our data file contains a listing of Years a person has worked for company A and their Salary.

Check for linear relationship

With a 2 variable data set, often it is quickest just to graph the data to check for a possible linear relationship.

#plot data
attach(df)
plot(Years, Salary)

Looking at the plot, there definitely appears to be a linear relationship. I can easily see where I could draw a line through the data.

linReg2.jpg

An even better way to do it is to check for correlation. Remember the closer to 1, the better the correlation found in the data.

#check for correlation
cor(Years, Salary)

linReg3.jpg

Since our correlation is so high, I think it is a good idea to perform an linear regression.

Linear Regression in R

A linear regression in R is pretty simple. The syntax is lm(y, x, data)

#perform linear regression
fit <- lm(Salary~Years, data= df)
summary(fit)

linReg4.jpg

Now let’s take a second to break down the output.

The red box shows my P values. I want to make sure they are under my threshold (usually 0.05). This becomes more important in multiple regression.

The orange box shows my R-squared values. Since this is a simple regression, both of these numbers are pretty much the same, and it really doesn’t matter which one you look at. What these numbers tell me is how accurate my prediction line is. A good way to look at them for a beginner is to consider them to be like percentages. So in our example, our prediction model is 75-76% percent accurate.

Finally, the blue box are your coefficients. You can use these numbers to create your predictive model. Remember the linear equation: Y = mX + b? Well using your coefficients here our equation now reads Y = 1720.7X + 43309.7

Predictions

You can use fitted() to show you how your model would predict your existing data

fitted(fit)

linReg5.jpg

You can also use the predict command to try a new value

predict(fit, newdata =data.frame(Years= 40))

linReg7

Let’s graph our regression now.

plot(Years, Salary)

abline(fit, col = 'red')

linReg8

The Residuals Plot

I am not going to go too deep into the weeds here, but I want to show you something cool.

layout(matrix(c(1,2,3,4),2,2))  # c(1,2,3,4) gives us 4 graphs on the page, 

                                #2,2 - graphs are 2x2
plot(fit)

I promise to go more into this in a later lesson, but for now, I just want you to note the numbers you see popping up inside the graphs. (38,18,9) – These represent outliers. One of the biggest problems with any linear system is they are easily thrown off by outliers. So you need to know where you outliers are.

linReg9.jpg

If you look at the points listed in your graphs in your data, you will see why these are outliers. Now while this doesn’t tell you what to do about your outliers, that decision has to come from you, it is a great way of finding them quickly.

linReg10.jpg

The Code

# upload file
df <- read.csv(file.choose())
head(df)

#plot data
attach(df)
plot(Years, Salary)

#check for correlation
cor(Years, Salary)

#perform linear regression
fit <- lm(Salary~Years, data= df)
summary(fit)

#see predictions
fitted(fit)

predict(fit, newdata =data.frame(Years= 40))

#plot regression line 
plot(Years, Salary)

abline(fit, col = 'red')

layout(matrix(c(1,2,3,4),2,2)) 
plot(fit)

df

 

R: ggplot – Histograms

Let’s make a Histogram using ggplot

First step, import the ggplot2 library

library(ggplot2)

Now let’s look at our data. In this example I am using chickwts from R Data

head(chickwts)

str(chickwts)

histogram1.jpg

As you can see above, this data contains 2 columns weight ( numeric) and feed(Factor with 6 levels)

Histogram

In a histogram, we don’t need to worry about assigning a Y axis, the Y axis is the frequency count of our X variable.

Let’s set our data and x axis in ggplot. Let’s also assign it to variable.

pl <- ggplot(data=chickwts, aes(x=weight))

Now we can use the assigned variable in conjunction with our geom()

pl + geom_histogram(binwidth=10)

In our geom, I set a binwidth. This tells us how wide to make our bars. Setting binwidth to 10 makes each bar 10 units wide.

histogram2

Let’s give it some color

pl + geom_histogram(binwidth=10, fill="blue")

histogram3.jpg

Now, let’s set our colors based on our 2nd column “feed”. Note how a legend is automatically generated.

pl + geom_histogram(binwidth=10, aes(fill=feed))

histogram4.jpg

For the finishing touch, let’s add some black bordering around our boxes.

pl + geom_histogram(binwidth=10, aes(fill=feed), color="black")

histogram5.jpg

The Code

library(ggplot2)

head(chickwts)

#-- look at structure
str(chickwts)

#set data and x axis value
pl <- ggplot(data=chickwts, aes(x=weight))

#create histogram, binwidth 10
pl + geom_histogram(binwidth=10)

#change bars to blue
pl + geom_histogram(binwidth=10, fill="blue")

#set color of bars by feed column values 
pl + geom_histogram(binwidth=10, aes(fill=feed))

#add black border around boxes
pl + geom_histogram(binwidth=10, aes(fill=feed), color="black")

R: Intro to ggplot()

ggplot() is a powerful graphing tool in R. While it is more complex to use than qplot(), its added complexity comes with advantages.

Don’t worry about the complexity, we are going to step into it slowly.

Let’s start by getting a data set. I am going to choose airquality from the R data sets package. You can find a list of data sets from that package here: Link

ggplotIntro

We are going to manipulate some of this data, so first let us set the dataframe to a new variable.

Next, take a look at the structure – str() – we have 6 variables and 153 rows. The first thing I notice is that Month is an int – I don’t want that. I would rather have Month be a factor      (a categorical variable)

ggplotIntro1.jpg

Use factor() to set Month to a factor. Now look at str() readout again. There are 5 levels (months) represented in our data set

ggplotIntro3

ggplot()

here is the basic syntax – ggplot(data, aes(x value, y value))+geom_[type of plot]

ggplot(data = airQ,  aes(x = Wind, y = Temp)) + geom_point()

ggplotIntro4

Now let’s add some color to the chart. Inside aes() – which stands for aesthetics – we are going to add color = Month

 ggplot(data = airQ, aes(x = Wind, y = Temp, color = Month)) + geom_point()

ggplotIntro5

Now add some size. Inside aes() add size = Ozone.

ggplot(data = airQ, aes(x = Wind, y = Temp, color = Month, size = Ozone)) + geom_point()

ggplotIntro6

The Code

library(ggplot2)

head(airquality)

#-- assign data to a new variable
airQ <- airquality

#-- look at structure
str(airquality)

#-- set month as a factor
airQ$Month <- factor(airQ$Month)
str(airQ)

#-- plot wind vs temp - scatter plot
 ggplot(data = airQ, aes(x = Wind, y = Temp)) + geom_point()
 
#-- plot add factor (Month) as a color
 ggplot(data = airQ, aes(x = Wind, y = Temp, color = Month)) + geom_point()
 
#-- set Ozone as a size element
ggplot(data = airQ, aes(x = Wind, y = Temp, color = Month, size = Ozone)) + geom_point()

R: Graphing with matplot()

matplot() is a R function that you can use to make easy graphs in R.

Let’s start by creating a data set.

We are going to create a 5 x 10 matrix representing 10 quiz and test grades for 5 students

x <- sample(50:100,50, T)
x
# convert to a 5x10 matrix
A <- matrix(x,5,10)
A

matplot

Let’s make name our rows and columns

#name and assignments vectors
names <- c("Sara", "Jill", "Jared", "Kim", "Don")
assignments <- c("Quiz 1", "Quiz 2", "Test 1", "Quiz 3", "Quiz 4", "Midterm", "Quiz 5", "Quiz 6", "Test 2","Final")
#assign labels to columns and rows
row.names(A) <- names
colnames(A) <- assignments
A

matplot1.jpg

Let’s graph it

Start simple

matplot(A)

matplot2.jpg

Note our X axis goes to 5 – we are graphing students against grades.

What if I want to graph assignments against grades? Simple, transpose the matrix

matplot(t(A))

matplot3

Okay, but the numbers are distracting. Let’s do something about that.

matplot(t(A), type="b")

matplot4

I don’t like the numbers

# replace numbers with shapes
matplot(t(A), type="b", pch=15:19)

pch 15:19 lets us rotate through 4 shapes (15 – 19) – try different numbers on your own

matplot5.jpg

I am adding a color element next. You will see the need for this in our next section.

 matplot(t(A), type="b", pch=15:18, col=c(1:5))

This has no current effect on our graph

Legend

Let’s add a legend

legend("bottomleft", inset=0.01, legend=names, col=c(1:5),pch=15:19,
bg= ("white"), horiz=F)

okay, here is the syntax: legend(location, padding, legend data, colors, symbols, background, horizontal vs vertical layout)

This is why I added the color element to our matplot, so the colors in the legend would match up to the colors in my graph.

matplot6.jpg

 

 

 

R: seq() and rep()

If you want to count out a sequence of number in R, you can simply use a :

1:15 goes from 1 to 15, while 6:22 goes from 6 to 22

seqRep.jpg

You can even assign these sequences to a variable, creating a vector

seqRep1.jpg

However, you can only count by 1 using this method.

seq()

With seq(1,15) I can count from 1 to 15, just like using 1:15

seqRep2.jpg

If I add a third argument though, now I am counting from 1 to 15 by 4

seqRep3.jpg

rep()

rep() stands for replicate. Using rep(), I can make a list of repeating elements

seqRep4

You can of course create vectors with both seq() and rep() by assigning them to a variable

seqRep5.jpg

Remember R indexes start at 1 not 0

so using the vector above, a[3] = 9 – a[1:3] = 1,5,9

 

 

 

R: Loops – For, While, Repeat

Loops are how we get computers to repeat tasks. In R, there are three standard loops: For, While, and Repeat

For

The For loop is a simple loop that iterates through a set of elements. With each iteration (running of the loop) the action found inside the loop is repeated.

In the loop I created below, 1:20 means 1 through 20. i in 1:20 mean count 1 through 20, assigning the current value of the count to i for the iteration of the loop.

rloop

Here are the results

rloop1.jpg

You can use a vector to iterate through. You can even use strings.

rloop2

Nested Loop

You can even nest your loops (running a loop inside another loop)

My main loop counts to 2. Each time it runs it prints its count and then it runs a second loop that steps through a vector containing 3 strings. It prints out each string in order, then returns to the top of the main loop and does it one more time.

rloop3.jpg

While loop

While loops work off of conditional logic. The general concept is while some condition is True, the loop will iterate. Once the condition is False, the loop terminates.

Below is state, while c is less than 10, iterate through the loop. Inside the loop I created a counter that adds 2 to the value of c each time. The end result is, I have created a listing of even numbers from 0 to 8

rloop4

Repeat Loop

Repeat loops repeat themselves (shocking!! I know!) until they are terminated using the break command.

Note the break command can be used in For and While loops if needed.

rloop5

The Code

# loop prints numbers 1 - 20
for (i in 1:20){
    print (i)
}

# loop prints elements in vector
for (i in c("Dog","Cat","Frog")){
     print(i)
}

#nested loop
for (n in 1:2){
   print(n)
   for (i in c("Dog","Cat","Frog")){
      print(i)
   } 
}

#while loop
c <- 0
while (c<10){
   print(c)
   c <- c+2
}

#repeat loop
a <-0
repeat{
   print(a)
   if (a==10){
     break}
   else {
    a<-a+2}
}