Sentiment Analysis: Beyond Words

First, I split each review into sentences, and used spaCy and gensim to get distinct topics that reviewers mentioned in each sentence (namely, food quality, service, wait times, atmosphere, and menu variety).

Once I had my topics (I’ll leave topic modeling for another blog), I needed to figure out if a reviewer felt positively or negatively about that aspect of the restaurant.

This post compares two ways to model reviewer sentiment: VADER and StanfordCoreNLP.

Sentiment scoring with VADERFirst, I tried the VADER sentiment package, and defined a function sentiment_analyzer_scores() to return the overall sentiment rating from -1 (very negative) to 1 (very positive).

from vaderSentiment.

vaderSentiment import SentimentIntensityAnalyzerimport reimport stringanalyzer = SentimentIntensityAnalyzer()def sentiment_analyzer_scores(text): score = analyzer.

polarity_scores(text) print(text) print(score)The first sentence I tried was pretty straightforward: “this place was amazing great food and atmosphere”.

This review is clearly positive and, sure enough, the VADER compound sentiment score was 0.

84.

So far so good.

text_pos = 'this place was amazing great food and atmosphere'sentiment_analyzer_scores(text_pos)VADER also did well on this pretty straightforward negative review, returning a compound score of -0.

66:text_neg = 'i didnt like their italian sub though just seemed like lower quality meats on it and american cheese'sentiment_analyzer_scores(text_neg)However, on this more nuanced example, it gets stuck.

Take the review “everything tastes like garbage to me but we keep coming back because my wife loves the pasta”.

This reviewer clearly does NOT like this restaurant, despite the fact that his or her wife “loves” the pasta (side note, this reviewer should win spouse of the year for continuing to eat garbage to please their wife!) Any food review with the word garbage should be an immediate negative, but VADER returns a very positive score of 0.

7.

text_amb = "everything tastes like garbage to me but we keep coming back because my wife loves the pasta"sentiment_analyzer_scores(text_amb)So what happened?.This function below returns a list of words that VADER categorizes as positive, neutral, and negative.

According to the readme, “VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.

” As such, it relies on the polarity of certain words to determine the overall sentiment.

import nltknltk.

download('punkt')nltk.

download('vader_lexicon')from nltk.

tokenize import word_tokenize, RegexpTokenizerdef get_word_sentiment(text): tokenized_text = nltk.

word_tokenize(text) pos_word_list=[] neu_word_list=[] neg_word_list=[]for word in tokenized_text: if (analyzer.

polarity_scores(word)['compound']) >= 0.

1: pos_word_list.

append(word) elif (analyzer.

polarity_scores(word)['compound']) <= -0.

1: neg_word_list.

append(word) else: neu_word_list.

append(word)print('Positive:',pos_word_list) print('Neutral:',neu_word_list) print('Negative:',neg_word_list)As the output below shows, the positive polarity of the words “loves” and “like” must be quite high.

Further, without a broader syntactic understanding of this sentence, the only word that would register this sentence as negative is “garbage”.

In this case, “garbage” is considered neutral, and the overall text is determined to be rather positive.

get_word_sentiment(text_amb)Enter Stanford Core NLPStanford’s Core NLP program has just the solution to this problem, since it was trained on movie reviews wherein a reviewer might discuss both positive and negative movie aspects in the same sentence (e.

g.

“the plot was slow but the acting was great”).

According to the site, rather than looking at the sentiment of individual words, the model “actually builds up a representation of whole sentences based on the sentence structure.

It computes the sentiment based on how words compose the meaning of longer phrases.

This way, the model is not as easily fooled as previous models.

”Perfect!.Luckily, too, there’s a Python wrapper that lets you make calls to the Core NLP Server (which returns results surprisingly quickly).

To make the calls, you’ll need to pip install pycorenlp, and import StanfordCoreNLP from pycorenlp.

Then, in the terminal, cd into the Stanford CoreNLP folder and start the server with:cd stanford-corenlp-full-2018-10-05java -mx5g -cp "*" edu.

stanford.

nlp.

pipeline.

StanfordCoreNLPServer -timeout 10000Great — now let’s see how it did.

#!pip install pycorenlpfrom pycorenlp import StanfordCoreNLPnlp = StanfordCoreNLP('http://localhost:9000')def get_sentiment(text): res = nlp.

annotate(text, properties={'annotators': 'sentiment', 'outputFormat': 'json', 'timeout': 1000, }) print(text) print('Sentiment:', res['sentences'][0]['sentiment']) print('Sentiment score:', res['sentences'][0]['sentimentValue']) print('Sentiment distribution (0-v.

negative, 5-v.

positive:', res['sentences'][0]['sentimentDistribution'])Passing in the review of the food as garbage, the model classifies the overall sentence as pretty negative (0 is most negative, 4 is most positive).

The sentiment distribution shows that there are some neutral and even positive aspects of this sentence, but overall this is not a good assessment.

get_sentiment(text_amb)There’s also a cool live demo that shows how the model parses different points of the sentence into positive and negative aspects:http://nlp.

stanford.

edu:8080/sentiment/rntnDemo.

htmlFor good measure I’ll pass in the positive and negative sentences from above:get_sentiment(text_pos)get_sentiment(text_neg)So there you have it, a nuanced sentiment analysis package perfect for reviews of movies, books, consumer goods, and… pizza!I should point out that this post is by no means a critique of VADER — it has some great features, such as its ability to recognize social media colloquialisms (“LOL”, emojis), and to pick up on emphasis from all caps and punctuation.

Rather, my aim is to highlight a sentiment analysis tool that is well-suited for customer reviews containing a combination of positive and negative aspects.

I hope you find this post helpful and welcome any feedback or questions in the comments!.

. More details

Leave a Reply