R: Twitter Sentiment Analysis

Having a solid understanding of current public sentiment can be a great tool. When deciding if a new marketing campaign is being met warmly, or if a news release about the CEO is causing customers get angry, people in charge of handling a company’s public image need these answers fast. And in the world of social media, we can get those answers fast. One simple, yet effective, tool for testing the public waters is to run a sentiment analysis.

A sentiment analysis works like this. We take a bunch of tweets about whatever we are looking for (in this example we will be looking at President Obama). We then parse those tweets out into individual words and we count the number of positive words and compare it to the number of negative words.

Now the simplicity of this model misses out on some things. Sarcasm can easily missed. Ex. “Oh GREAT job Obama. Thanks for tanking the country once again”. Our model will count 2 positive words (Great and Thanks) and 1 negative word (tanking) giving us an overall score of positive 1.

There are more complex methods for dealing with the issue above, but you’ll be surprised at how good the system works all by itself. While, yes we are going to misread a few tweets, we have the ability to read thousands of tweets, so the larger volume of data negates the overall effect of the sarcastic ones.

First thing we need to do is go get a list of good and bad words. You could make your own up, but there are plenty of pre-populated lists on the Internet for free. The one I will be using is from the University of Illinois at Chicago. You can find the list here:

http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html

Once you go to the page, click on Opinion Lexicon and then download the rar file.

You can dowload from the link below, but I want you to know the source in case this link breaks.

A list of English positive and negative opinion words or sentiment words (around 6800 words). This list was compiled over many years starting from our first paper (Hu and Liu, KDD-2004).

Now open the rar file and move the two text files to a folder you can work from.

Next let’s make sure we have the right packages installed. For this we will need, TwitteR, plyr, stringr, and xlsx. If you do not have these packages installed, you can do so using the following code. (just change out TwitteR for whatever package you need to install)

install.packages("TwitteR")

Now load the libraries

library(stringr)
library(twitteR)
library(xlsx)
library(plyr)

and connect to the Twitter API. If you do not already have a connection set up, check out my lesson on connecting to Twitter: R: Connect to Twitter with R

api_key<- "insert consumer key here"
api_secret <- "insert consumer secret here"
access_token <- "insert access token here"
access_token_secret <- "insert access token secret here
setup_twitter_oauth(api_key,api_secret,access_token,access_token_secret)

Okay, so now remember where you stored the text files we just downloaded and set that location as your working directory (wd). Note that we use forward slashes here, even if you are on a Windows box.

setwd("C:/Users/Benjamin/Documents")
neg = scan("negative-words.txt", what="character", comment.char=";")
pos = scan("positive-words.txt", what="character", comment.char=";")

scan looks through the text files and pulls words that start with characters and ignores comment lines that start with ;

You should now have 2 lists of positive and negative words.

You can add words to either list using a vector operation. Below I added wtf – a popular Internet abbreviation for What the F@#$@ to the negative words

neg = c(neg, 'wtf')

Okay, now here is the engine that runs our analysis. I have tried to comment on what certain commands you may not recognize do. I have lessons on most features listed here, and will make more lessons on anything missing. If I were to try to explain this step by step, this page would be 10000 lines long and no one would read it.

score.sentiment = function(tweets, pos.words, neg.words)
 
{
 
require(plyr)
require(stringr)

scores = laply(tweets, function(tweet, pos.words, neg.words) {



tweet = gsub('https://','',tweet) # removes https://
tweet = gsub('http://','',tweet) # removes http://
tweet=gsub('[^[:graph:]]', ' ',tweet) ## removes graphic characters 
       #like emoticons 
tweet = gsub('[[:punct:]]', '', tweet) # removes punctuation 
tweet = gsub('[[:cntrl:]]', '', tweet) # removes control characters
tweet = gsub('\\d+', '', tweet) # removes numbers
tweet=str_replace_all(tweet,"[^[:graph:]]", " ") 

tweet = tolower(tweet) # makes all letters lowercase

word.list = str_split(tweet, '\\s+') # splits the tweets by word in a list
 
words = unlist(word.list) # turns the list into vector
 
pos.matches = match(words, pos.words) ## returns matching 
          #values for words from list 
neg.matches = match(words, neg.words)
 
pos.matches = !is.na(pos.matches) ## converts matching values to true of false
neg.matches = !is.na(neg.matches)
 
score = sum(pos.matches) - sum(neg.matches) # true and false are 
                #treated as 1 and 0 so they can be added
 
return(score)
 
}, pos.words, neg.words )
 
scores.df = data.frame(score=scores, text=tweets)
 
return(scores.df)
 
}

Now let’s get some tweets and analyze them. Note, if your computer is slow or old, you can lower the number of tweets to process. Just change n= to a lower number like 100 or 50

tweets = searchTwitter('Obama',n=2500)
Tweets.text = laply(tweets,function(t)t$getText()) # gets text from Tweets

analysis = score.sentiment(Tweets.text, pos, neg) # calls sentiment function

Now lets look at the results. The quickest method available to us is to simply run a histogram

hist(analysis$score)

My results looks like this

2016-11-25_14-39-39

If 0 is completely neutral most people are generally neutral about the president and more people have positives tweets then negatives ones. This is not uncommon for an outgoing president. They generally seem to get a popularity boost after the election is over.

Finally, if you want to save your results, you can export them to excel.

write.xlsx(analysis, "myResults.xlsx")

And you will end up with a file like this

11 thoughts on “R: Twitter Sentiment Analysis”

Dear Mr. Larson,
I really enjoy your tutorial on R. But right now I have a little problem: If I try to use the above code with the given instructions an error occurs that says, that the object ‘analysis.score’ couldn’t be found (used in the hist() method). Everything else works fine.

Hope you can help me and that you keep going with tutorials on analytics with R.

Warm regards
Florian Dahlitz.

December 20, 2016 at 2:04 pm Reply

Ben Larson

Florian,

Great Catch!! I fixed the code on the page. The issue was I was using hist(analysis.score) when I should have been using hist(analysis$score). Analysis is a dataframe and you call elements of a dataframe using the $ not a period “.”

Sorry if this caused any frustration.

Loading...

December 20, 2016 at 2:25 pm Reply
1. Florian
  
  Thank you very much!
  Looking forward for more great analytics tutorials!!!
  
  Loading...
  
  December 20, 2016 at 3:07 pm

Hi, ,Can you please explain what this part of the code does?

scores.df = data.frame(score=scores, text=tweets)

return(scores.df)

Thanks!

August 5, 2017 at 6:01 am Reply

Wonderful. you simplified it

September 12, 2017 at 10:26 am Reply

How to get the clean data in the Myresults file?

December 2, 2017 at 9:37 pm Reply

Dear Mr. Larson,
I really enjoy your tutorial on R. But right now I have a little problem: how to extract particular tweets from a particular user timeline,like if i want to extract the tweets of Infosys based on the topic manufacturing,and how can i extract the comments of a tweet to perform sentiment analysis on it.

December 15, 2017 at 10:06 am Reply

Anonymous

thann aaruva

Loading...

March 8, 2018 at 10:06 am Reply

Hi please help me out with this error

Error in sort.list(y) :
invalid input ‘RT @FriendlyJMC: Pardons Granted by President Barack Obama (2009-2017) | PARDON | Department of Justice

Lí ½í±€k how many drug dealers were onâ€¦’ in ‘utf8towcs’

March 20, 2018 at 8:32 pm Reply

Hi,
It will create a dataframe scores.df with two columns “score” and “text”. Where scores and tweets variables are used to feed data into score and text which they have. Finally return(scores.df) is the value returned by the function and it will be returning scored.df as final result.

June 6, 2018 at 9:39 am Reply

Error in curl::curl_fetch_memory(url, handle = handle) :
Couldn’t connect to server

i am getting this error at below execution

setup_twitter_oauth(customer_key,customer_secret,access_token = NULL,access_secret = NULL)

can you plz suggest????

February 24, 2019 at 3:21 pm Reply

Florian

Dear Mr. Larson,
I really enjoy your tutorial on R. But right now I have a little problem: If I try to use the above code with the given instructions an error occurs that says, that the object ‘analysis.score’ couldn’t be found (used in the hist() method). Everything else works fine.

Hope you can help me and that you keep going with tutorials on analytics with R.

Warm regards
Florian Dahlitz.

Loading...

December 20, 2016 at 2:04 pm Reply
1. Ben Larson
  
  Florian,
  
  Great Catch!! I fixed the code on the page. The issue was I was using hist(analysis.score) when I should have been using hist(analysis$score). Analysis is a dataframe and you call elements of a dataframe using the $ not a period “.”
  
  Sorry if this caused any frustration.
  
  Loading...
  
  December 20, 2016 at 2:25 pm Reply
  1. Florian
    
    Thank you very much!
    Looking forward for more great analytics tutorials!!!
    
    Loading...
    
    December 20, 2016 at 3:07 pm
aliceinwanderland123

Hi, ,Can you please explain what this part of the code does?

scores.df = data.frame(score=scores, text=tweets)

return(scores.df)

Thanks!

Loading...

August 5, 2017 at 6:01 am Reply
Mr.Srini

Wonderful. you simplified it

Loading...

September 12, 2017 at 10:26 am Reply
sglk

How to get the clean data in the Myresults file?

Loading...

December 2, 2017 at 9:37 pm Reply
Siddhartha pal

Dear Mr. Larson,
I really enjoy your tutorial on R. But right now I have a little problem: how to extract particular tweets from a particular user timeline,like if i want to extract the tweets of Infosys based on the topic manufacturing,and how can i extract the comments of a tweet to perform sentiment analysis on it.

Loading...

December 15, 2017 at 10:06 am Reply
1. Anonymous
  
  thann aaruva
  
  Loading...
  
  March 8, 2018 at 10:06 am Reply
Saurabh Rai

Hi please help me out with this error

Error in sort.list(y) :
invalid input ‘RT @FriendlyJMC: Pardons Granted by President Barack Obama (2009-2017) | PARDON | Department of Justice

Lí ½í±€k how many drug dealers were onâ€¦’ in ‘utf8towcs’

Loading...

March 20, 2018 at 8:32 pm Reply
Nitesh

Hi,
It will create a dataframe scores.df with two columns “score” and “text”. Where scores and tweets variables are used to feed data into score and text which they have. Finally return(scores.df) is the value returned by the function and it will be returning scored.df as final result.

Loading...

June 6, 2018 at 9:39 am Reply
DISHA MALHOTRA

Error in curl::curl_fetch_memory(url, handle = handle) :
Couldn’t connect to server

i am getting this error at below execution

setup_twitter_oauth(customer_key,customer_secret,access_token = NULL,access_secret = NULL)

can you plz suggest????

Loading...

February 24, 2019 at 3:21 pm Reply

	Anonymous on Python Web Scraping / Automati…
	rajendarqvkelly on PIG: Use GRUNT to Access PIG f…
	A Transient Historic… on Inverted Index Database
	Anonymous on XML Parsing: Advanced SQL
	Database Development… on SQL Server: Importing Excel Fi…

	Anonymous on Python Web Scraping / Automati…
	rajendarqvkelly on PIG: Use GRUNT to Access PIG f…
	A Transient Historic… on Inverted Index Database
	Anonymous on XML Parsing: Advanced SQL
	Database Development… on SQL Server: Importing Excel Fi…

	Anonymous on Python Web Scraping / Automati…
	rajendarqvkelly on PIG: Use GRUNT to Access PIG f…
	A Transient Historic… on Inverted Index Database
	Anonymous on XML Parsing: Advanced SQL
	Database Development… on SQL Server: Importing Excel Fi…

	Anonymous on Python Web Scraping / Automati…
	rajendarqvkelly on PIG: Use GRUNT to Access PIG f…
	A Transient Historic… on Inverted Index Database
	Anonymous on XML Parsing: Advanced SQL
	Database Development… on SQL Server: Importing Excel Fi…

Analytics4All

R: Twitter Sentiment Analysis

Like this:

Related

11 thoughts on “R: Twitter Sentiment Analysis”

Leave a ReplyCancel reply

Share this:

Like this:

Related

11 thoughts on “R: Twitter Sentiment Analysis”

Leave a ReplyCancel reply

Discover more from Analytics4All