Challenges in sentiment analysis: a case for word clouds (for now)

For example, if I only take the list of positive tweets and I get an aggregate sentiment score for the group using a different classification method, I am shown that within the group of positive tweets are some negative sentiments.

sid = SentimentIntensityAnalyzer()sentiment_summary = dict()# for readme in readmes:# sentences = nltk.

tokenize.

sent_tokenize(readme)# for sentence in sentences:# sentiment_score = sid.

polarity_scores(sentence)messages = pos_listsummary = {"positive":0,"neutral":0,"negative":0}for x in messages: ss = sid.

polarity_scores(x) if ss["compound"] == 0.

0: summary["neutral"] +=1 elif ss["compound"] > 0.

0: summary["positive"] +=1 else: summary["negative"] +=1print(summary){'positive': 108, 'neutral': 189, 'negative': 89}Code above by: Thomas BarrassoI can compare algorithms that give me positive and negative categorization.

In this case I looked at: A.

Bag of words (Simple vectorization), B.

TF-IDF (Term Frequency — Inverse Document Frequency).

from sklearn.

naive_bayes import GaussianNBfrom sklearn.

model_selection import train_test_splitfrom sklearn.

metrics import f1_scoredef naive_model(X_train, X_test, y_train, y_test): naive_classifier = GaussianNB() naive_classifier.

fit(X_train.

toarray(), y_train)# predictions over test set predictions = naive_classifier.

predict(X_test.

toarray()) # calculating f1 score print(f'F1 Score – {f1_score(y_test, predictions)}')X_train, X_test, y_train, y_test = train_test_split(bow_word_feature, target_variable, test_size=0.

3, random_state=870)naive_model(X_train, X_test, y_train, y_test)X_train, X_test, y_train, y_test = train_test_split(tfidf_word_feature, target_variable, test_size=0.

3, random_state=870)naive_model(X_train, X_test, y_train, y_test)Code above by Amardeep Chauhan# Bag of words (Simple vectorization) produces an F1 score of 0.

7745 # while TF-IDF produces an F1 score of 0.

79227.

Not too different!Next I took a look at the top used words with bar charts.

import matplotlibimport matplotlib.

pyplot as plt%matplotlib inlinex = list(dict(top_15_pos).

keys())y = list(dict(top_15_pos).

values())plt.

barh(x,y, align='center', alpha=0.

5)plt.

xlabel('Mentions')plt.

title('Top 15 words used in positive Uber IPO tweets')There were a lot of options with the seaborn library, and I iterated through many that I thought might be more intuitive than a word cloud.

Although there are plenty of “cool looking” visualizations, I had to accept at long last that most of them were not better than a bar chart or word cloud for sharing content meaning.

None of these charts actually produced any further insights.

I asked the writer of the sentiment analysis tutorial I was using for the types of visualizations he would use.

Here is his response:Ultimately, I used this tutorial with the below code to create two word clouds of the terms associated with positive and those associated negative sentiment.

all_words = ' '.

join([text for text in pos_df['absolute_tidy_tweets']])from wordcloud import WordCloudwordcloud = WordCloud(width=800, height=500, random_state=21, max_font_size=110).

generate(all_words)plt.

figure(figsize=(10, 7))plt.

imshow(wordcloud, interpolation="bilinear")plt.

axis('off')plt.

show()The words that occurred with a positive value for the text.

What is useful about the word cloud is that it does start to get into the scale of some terms over others.

I can also use the word cloud to show how using an algorithm can be useful.

There are common words on both the negative and positive visualizations, showing that the same phrases can mean different things in new contexts, and that algorithms can consume information from an entire phrase.

Words that occurred with the negative labeling of text.

But say I don’t need to just know if some news is perceived positively or not, I need to get to some sense of what it all means.

I can’t get a summary of all the ideas in longer phrases, like the notes that you’d get from minutes at a meeting with word clouds.

I lose the context of the entire sentence these phrases were in, and can only bring my own interpretation and outside research into these word cluster.

In the end, it may be more useful to read something like this New York Times piece to get a true understanding of what we mean when we say what we say.

.

. More details

Leave a Reply