Basic Sentiment Analysis using NLTK

Basic Sentiment Analysis using NLTKSamira MunirBlockedUnblockFollowFollowingMar 15“Your most unhappy customers are your greatest source of learning.

” — Bill GatesSo what does the customer say?In today’s context, it turns out a LOT.

Social media has opened the floodgates of customer opinions and it is now free-flowing in mammoth proportions for businesses to analyze.

Today, using machine learning companies are able to extract these opinions in the form of text or audio and then analyze the emotions behind them on an unprecedented scale.

Sentiment analysis, opinion mining call it what you like, if you have a product/service to sell you need to be on it.

“ When captured electronically, customer sentiment — expressions beyond facts, that convey mood, opinion, and emotion — carries immense business value.

We’re talking the voice of the customer, and of the prospect, patient, voter, and opinion leader.

” —Starting from user reviews in media to analyzing stock prices, sentiment analysis has become a ubiquitous tool in almost all industries.

For example, the graph below shows the stock price movement of eBay with a sentiment index created based on an analysis of tweets that mention eBay.

Figure 1What is Sentiment Analysis?Techopedia defines sentiment analysis as follows:Sentiment analysis is a type of data mining that measures the inclination of people’s opinions through natural language processing (NLP), computational linguistics and text analysis, which are used to extract and analyze subjective information from the Web — mostly social media and similar sources.

The analyzed data quantifies the general public’s sentiments or reactions toward certain products, people or ideas and reveal the contextual polarity of the information.

Sentiment analysis is also known as opinion mining.

There are two broad approaches to sentiment analysis.

Pure statistics:These kinds of algorithms treat texts as bags of words, where the order of words and as such context is ignored.

The original text is filtered down to only the words that are thought to carry sentiment.

For this blog, I will be attempting this approach.

Such models make no use of understanding of a certain language and only uses statistical measures to classify a text.

A mix of statistics and linguistics:These algorithms attempt to incorporate grammar principles, various natural language processing techniques and statistics to train the machine to truly ‘understand’ the language.

Sentiment analysis can also be broadly categorized into two kinds, based on the type of output the analysis generates.


Categorical/Polarity— Was that bit of text “positive”, “neutral” or “negative?” In this process, you are trying to label a piece of text as either positive or negative or neutral.


Scalar/Degree — Give a score between a predefined scale that ranges from highly positive to highly negative.

For example, the figure below shows an analysis of of sentiment based on tweets about various election candidates.

In this instance the sentiment is being measured in a scalar form.

Figure 2During my data science boot camp, I took a crack at building a basic sentiment analysis tool using NLTK library.

I found a nifty youtube tutorial and followed the steps listed to learn how to do basic sentiment analysis.

While the tutorial focuses on analyzing Twitter sentiments, I wanted to see if I could label movie reviews into either positive or negative.

I found a labeled dataset of 25000 IMDB reviews in the form of .

txt files separated into two folders for negative and positive reviews.

I imported the following libraries on my Jupyter notebook and read the positive and negative reviews from their respective folders.

Making the bag of words: For our Bag of Words(BOW) we want all unique words and technically it could include all unique words.

However, it can be computationally expensive and even unnecessary to use all words.

For example, the name of an actress would not give information about the sentiment of the review.

It would be a waste of resources to include them in our bag of words.

So, as data scientists, we need to be smart and select the most informative words.

For the small scope of the project, I selected only adjectives from the features based on the assumption that adjectives are highly informative of positive and negative sentiments.

For each review, I removed punctuations, tokenized the string, removed stop words.

Please check out my blog on how to perform these basic preprocessing tasks.

Next, to get a list of all adjectives I performed parts of speech (also discussed in my blog mentioned above) tagging and created our BOW or in this case bag of adjectives.

I called this list ‘all_words’ and it needs another round of filtering still.

Next, to pick the most informative adjectives I created a frequency distribution of the words in all_words, using nltk.

FreqDist() method.

I made a list of the top 5000 most frequently appearing adjectives from all_words.

At this point, all_words is ready to be used as our final BOW with 5000 words was then ready to be used in the model.

Create Features for Each Review: For each review, I created a tuple.

The first element of the tuple is a dictionary where the keys are each of the 5000 words from BOW and values for each key is either True if the word appears in the review or False if the word does not.

The second element is the label for that tag, ‘pos’ for positive reviews and ‘neg’ for negative reviews.

#example of a tuple feature set for a given review({'great': True, 'excellent': False, 'horrible': False}, 'pos') I then split the list of tuples (called feature_set from here on) into training set (20, 000) and testing set (5,000)The fun part: Machine Learning!!!Now that I had my features and the training and testing set ready, my next step was to try a vanilla base model.

For my base model, I used the Naive Bayes classifier module from NLTK.

The model had an accuracy of 84.


Which was pretty good for a base model and not surprising given the size of the training data.

The figure on the right shows both the confusion matrix for the prediction without and with normalization.

The list to above (left) shows 15 of the most informative features from the model.

This listing shows that the names in the training set that end in “a” are female 33 times more often than they are male, but names that end in “k” are male 32 times more often than they are female.

And the ratios associated with them shows how much more often each corresponding word appear in one class of text over others.

These ratios are known as likelihood ratios.

For example, the word ‘lousy’ is 13 times more likely to occur in a negative review than in a positive review.

To further evaluate the model I calculated the f1_score using sci-kit learn and created a confusion matrix.

The f1_score was 84.

36%Next, I tried out training other classifying algorithms on the training set find the model with the best score.

I used, Multinomial Naive Bayes, Bernoulli Naive Bayes, Logistic Regression, Stochastic Gradient Descent and Support Vector Classifier.

NLTK has a builtin sci-kit learn feature that has all these classifiers in it.

The model metrics.

The above figure shows the f1 scores of the models and that more or less all of the models did a fairly good job of predicting.

However, both of the Naive Bayes models did slightly better.

Final Steps: Build an ensemble model to make future predictionsIn order to get better fi_score, I tried to build an ensemble model.

An ensemble model combines the predictions (take votes) from each of the above models for each review and uses the majority vote as the final prediction.

To avoid having to re-train the models (since each one took about 8 to 12 minutes to train), I stored all of the models using the pickle module.

Pickle is a super useful python module that allows you to close out your current kernel and still retain python objects that might have taken a long time to create.

To build the ensemble model I created an EnsembleClassifier class that is initialized with a list of classifiers.

It is important that an odd number of classifiers are used as part of the ensemble to avoid the chance for a tie The class has two main methods, classify: which returns a predicted label and confidence: which returns the degree of confidence in the prediction.

This degree is measured as (Number of winning votes)/Total Votes.

Next, I loaded all the models using pickle, initialized an ensemble model object and fed the list of features from the testing sets to the model.

The f1-score of the ensemble model as shown below was 85%.

A slight increase from the highest f1-scores of 84.

5% that we achieved earlier with our Original Naive Bayes model.

The same class can be used to do a live classification of a single review as well.

The function below takes in a single review, creates a feature set for that review and then spits out a prediction using the ensemble method.

The output gives you a label and the degree of confidence in that labeling.

To demonstrate, I collected reviews of Captain Marvel from rotten tomatoes.

I intentionally took two reviews that were not as polarizing and two that were very polarizing to see how the model performs.

And turned out the model is did pretty well!.The model was not so sure about the less polarizing reviews text_a and text_c.

But identified the polarizing text_b and text_d with a much higher degree of confidence.

That’s it!You can find the example codes for this project at my GitHub repository and also in the original webpage tutorial that I used as a guideline.

I am now interested to see if I can identify sarcasm in a text.

Sarcasm is subtle and hard to pick up even by humans.

I tested the ensemble mode on a quote by my favorite author Kurt Vonnegut who is known for his satire.

The model labels the quote “Everything was beautiful and nothing hurt” as positive with 100% confidence.

But knowing Kurt Vonnegut and the context of that quote you could argue that this quote had a deeper sentiment of loss.

The model understandably failed to capture this.

Because the model was not trained to identify sarcasm in the first place.

>>> text = “Everything was beautiful and nothing hurt”>>> rev_class, conf = sentiment(text)>>> print(f'Review Classification: {rev_class}, with {round(conf*100)}% confidence')Review Classification: pos, with 100% confidenceI would like to end, with the following quote about the nuances of sentiment analysis and its reliability.

Will Sentiment Analysis ever be 100% accurate, or close?Probably not, but that is not meant to be a bad thing.

This will not be because people aren’t smart enough to eventually make computers that really understand language.

Instead, this is really just plain impossible, seeing as how it’s rarely the case that 80% of people agree on the sentiment of text.

 — http://sentdex.

com/sentiment-analysis/Sources:Figure 1— Ebay Stock Prices — http://sentdex.

com/Figure 2 — How Twitter Feels about The 2016 Election Candidates— https://moderndata.


ly/elections-analysis-in-r-python-and-ggplot2-9-charts-from-4-countries/Sentiment Analysis Tutorial — https://pythonprogramming.

net/sentiment-analysis-module-nltk-tutorial/.. More details

Leave a Reply