RSS

Category Archives: Sentiment Analysis using Sentiment package

Step 2 – Sentiment Analysis using Sentiment Library


Its been long time, I wrote a post on Sentiment Analysis without using Sentiment Package. In this post, I will use Sentiment Package developed by Timothy Jurka. You can download this package from here. Before installing the sentiment package, you need to install tm and Rstem from CRAN. Sentiment package has two functions that server out purpose.

classify_emotion
This function helps us to analyze tweets / text and classify it in different types of emotion: anger, disgust, fear, joy, sadness, and surprise. The classification can be performed using two algorithms: one is a naive Bayes classifier trained on Carlo Strapparava and Alessandro Valitutti’s emotions lexicon; the other one is just a simple voter procedure.

classify_polarity
In contrast to the classification of emotions, the classify_polarity function allows us to classify some text as positive or negative. In this case, the classification can be done by using a naive Bayes algorithm trained on Janyce Wiebe’s subjectivity lexicon; or by a simple voter algorithm.

Lets import the necessary packages for Sentiment Analysis.

library(twitteR)
library(sentiment)
library (stringr)
library(ggplot2)
library(wordcloud)
library(RColorBrewer)

Lets do the analysis on ObamaInIndia as I did in my previous sentiment analysis post. I am using that code to pull the tweets and data cleaning. Lets move forward from that and sentiment Analysis.


# classify emotion
class_emotion = classify_emotion(tweet_txt, algorithm="bayes", prior=1.0)
# get emotion best fit
emotion = class_emotion[,7]
# substitute NA's by "unknown"
emotion[is.na(emotion)] = "unknown"

# classify polarity
class_polarity= classify_polarity(tweet_txt, algorithm="bayes")
# get polarity best fit
polarity = class_polarity[,4]

We have now emotions and polarity based on our tweets. Lets create data frame from the tweets, emotions and polarity.

# data frame with results
tweet_df = data.frame(text=tweet_txt, emotion=emotion,
polarity=polarity, stringsAsFactors=FALSE)

# sort data frame
tweet_df = within(sent_df,
emotion <- factor(emotion, levels=names(sort(table(emotion), decreasing=TRUE))))

Lets generate some plot based on above data set. Plot tweet distribution based on emotions.


ggplot(tweet_df, aes(x=emotion)) +
geom_bar(aes(y=..count.., fill=emotion))+xlab("Emotions Categories") + ylab("Tweet Count")+ggtitle("Sentiment Analysis of Tweets on Emotions")

senti plot

Plot tweet distribution based on Polarity


ggplot(tweet_df, aes(x=polarity)) +
geom_bar(aes(y=..count.., fill=polarity))+xlab("Polarities") + ylab("Tweet Count")+ggtitle("Sentiment Analysis of Tweets on Polarity")

Rplot01

Separate the text by emotions and visualize the words with a comparison cloud.


emos = levels(factor(tweet_df$emotion))
nemo = length(emos)
emo.docs = rep("", nemo)
for (i in 1:nemo)
{
tmp = tweet_txt[emotion == emos[i]]
emo.docs[i] = paste(tmp, collapse=" ")
}

# remove stopwords
emo.docs = removeWords(emo.docs, stopwords("english"))
# create corpus
corpus = Corpus(VectorSource(emo.docs))
tdm = TermDocumentMatrix(corpus)
tdm = as.matrix(tdm)
colnames(tdm) = emos

# comparison word cloud
comparison.cloud(tdm, colors = brewer.pal(nemo, "Dark2"),
scale = c(3,.5), random.order = FALSE, title.size = 1.5)

Rplot02

 

Tags: , , , , ,