RSS

Step 2 – Sentiment Analysis using Sentiment Library

01 May

Its been long time, I wrote a post on Sentiment Analysis without using Sentiment Package. In this post, I will use Sentiment Package developed by Timothy Jurka. You can download this package from here. Before installing the sentiment package, you need to install tm and Rstem from CRAN. Sentiment package has two functions that server out purpose.

classify_emotion
This function helps us to analyze tweets / text and classify it in different types of emotion: anger, disgust, fear, joy, sadness, and surprise. The classification can be performed using two algorithms: one is a naive Bayes classifier trained on Carlo Strapparava and Alessandro Valitutti’s emotions lexicon; the other one is just a simple voter procedure.

classify_polarity
In contrast to the classification of emotions, the classify_polarity function allows us to classify some text as positive or negative. In this case, the classification can be done by using a naive Bayes algorithm trained on Janyce Wiebe’s subjectivity lexicon; or by a simple voter algorithm.

Lets import the necessary packages for Sentiment Analysis.

library(twitteR)
library(sentiment)
library (stringr)
library(ggplot2)
library(wordcloud)
library(RColorBrewer)

Lets do the analysis on ObamaInIndia as I did in my previous sentiment analysis post. I am using that code to pull the tweets and data cleaning. Lets move forward from that and sentiment Analysis.


# classify emotion
class_emotion = classify_emotion(tweet_txt, algorithm="bayes", prior=1.0)
# get emotion best fit
emotion = class_emotion[,7]
# substitute NA's by "unknown"
emotion[is.na(emotion)] = "unknown"

# classify polarity
class_polarity= classify_polarity(tweet_txt, algorithm="bayes")
# get polarity best fit
polarity = class_polarity[,4]

We have now emotions and polarity based on our tweets. Lets create data frame from the tweets, emotions and polarity.

# data frame with results
tweet_df = data.frame(text=tweet_txt, emotion=emotion,
polarity=polarity, stringsAsFactors=FALSE)

# sort data frame
tweet_df = within(sent_df,
emotion <- factor(emotion, levels=names(sort(table(emotion), decreasing=TRUE))))

Lets generate some plot based on above data set. Plot tweet distribution based on emotions.


ggplot(tweet_df, aes(x=emotion)) +
geom_bar(aes(y=..count.., fill=emotion))+xlab("Emotions Categories") + ylab("Tweet Count")+ggtitle("Sentiment Analysis of Tweets on Emotions")

senti plot

Plot tweet distribution based on Polarity


ggplot(tweet_df, aes(x=polarity)) +
geom_bar(aes(y=..count.., fill=polarity))+xlab("Polarities") + ylab("Tweet Count")+ggtitle("Sentiment Analysis of Tweets on Polarity")

Rplot01

Separate the text by emotions and visualize the words with a comparison cloud.


emos = levels(factor(tweet_df$emotion))
nemo = length(emos)
emo.docs = rep("", nemo)
for (i in 1:nemo)
{
tmp = tweet_txt[emotion == emos[i]]
emo.docs[i] = paste(tmp, collapse=" ")
}

# remove stopwords
emo.docs = removeWords(emo.docs, stopwords("english"))
# create corpus
corpus = Corpus(VectorSource(emo.docs))
tdm = TermDocumentMatrix(corpus)
tdm = as.matrix(tdm)
colnames(tdm) = emos

# comparison word cloud
comparison.cloud(tdm, colors = brewer.pal(nemo, "Dark2"),
scale = c(3,.5), random.order = FALSE, title.size = 1.5)

Rplot02

 

Tags: , , , , ,

3 responses to “Step 2 – Sentiment Analysis using Sentiment Library

  1. Patrick Kim

    June 30, 2015 at 7:49 am

    Hello, thank you for the great work.
    I am doing the exact same sentiment analysis for twitter for my own words.
    However, everything worked just fine except for emotion classification and poliarity part.
    When I put the code of
    class_emotion = classify_emotion(tweet_txt, algorithm=”bayes”, prior=1.0)
    It says subscription is out of bounds,
    and I cannot proceed anymore since then.

    I loaded all the required packages.

    Can you please walk me through with this issue?

     
    • rhandbook

      June 30, 2015 at 7:44 pm

      Hi Patrick

      It seems tweet_txt variable has some issue. Can you please send me the code that generate tweet_txt variable.

       
  2. Trevor Miles

    September 17, 2015 at 2:23 pm

    It seems that the sentiment package is not available under R 3.2.0. Do you know if there is any plan to upgrade this package because it is no longer available on CRAN.

     

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

 
%d bloggers like this: