How Do People Feel About Saving Sea Turtles?

Sentiment Analysis of #savetheturtles tweets using…github.

comHypothesisOverall, I thought I’d see a decline in interest over time.

The below screenshot of a Google Trends search for “plastic straws” and “sea turtles” shows that there is a spike in searches for “plastic straws” around June-July, which is when the video went viral.

However, I believed overall sentiment within the tweets to be positive and support the movement.

As you can see, sea turtles have generated pretty consistent searches across the year, but plastic straws spiked around June — July, which is when the video went viral.

Data & Exploratory AnalysisI used the extremely helpful TwitterScraper to scrape the hashtag #savetheturtles.

I was able to get around 1500 tweets from January 2018 to January 2019.

I began my analysis in Jupyter notebooks by preprocessing the code.

Using NLTK, I converted text to lowercase, stripped punctuation, and removed stopwords.

From there, I found the most frequently words used in these tweets.

#Calculate frequency.

fdist = nltk.

FreqDist(filtered_stopwords)fdist.

most_common(10)However, there are some redundancies, such as “straw” and “straws”.

Therefore, I lemmatized the words to find the root of the word (e.

g.

“running” and “runs” both get reduced to “run”).

#try again with lemmatized wordsfrom nltk.

corpus import wordnet#create a function that would return WORDNET POS compliance to WORDENT lemmatization (a,n,r,v) def get_wordnet_pos(treebank_tag): if treebank_tag.

startswith(‘J’): return wordnet.

ADJ elif treebank_tag.

startswith(‘V’): return wordnet.

VERB elif treebank_tag.

startswith(‘N’): return wordnet.

NOUN elif treebank_tag.

startswith(‘R’): return wordnet.

ADV else: # As default pos in lemmatization is Noun return wordnet.

NOUN wnl = WordNetLemmatizer()#create an empty list to store lemmatized wordsdes_lem = []def wn_pos(filtered_pos): for word,pos in filtered_pos: des_lem.

append(wnl.

lemmatize(word,get_wordnet_pos(pos))) #print pos #print get_wordnet_pos(pos) return des_lem# Get the 10 most common wordsfdist_2 = nltk.

FreqDist(wn_pos(filtered_pos))fdist_2.

most_common(10)As we can see, lemmatizing brings words to the root phrase, allowing us to bypass repetition of words like “straw” and “straws”.

This also reveals more of the general theme around these tweets, which include “save” and “help.

”bigrm = nltk.

bigrams(filtered_stopwords)fdist = nltk.

FreqDist(bigrm)fdist.

most_common(10)Looking at the most common bi-grams, it even reveals a bit of a market around businesses that support the movement, such as Deep Blue Decals and Salty Girl!In addition, I also wanted to see if the viral impact of the video is still affecting tweets 6 months later.

To do this, I plotted the number or tweets every day from the beginning of the year to the end.

The number of tweets spike significantly in June, July, and August, reflecting the impact of the viral video.

However, we can also see a definite trend afterwards as the number of tweets overall increase in later months to today!Now onto the Sentiment Analysis…Based off of various articles, I decided to try the NLTK module VADER to analyze individual tweets on the positive, negative, and neutral sentiment of each tweet.

I found these articles particularly helpful:http://www.

nltk.

org/howto/sentiment.

htmlhttps://medium.

com/@sharonwoo/sentiment-analysis-with-nltk-422e0f794b8http://t-redactyl.

io/blog/2017/04/using-vader-to-handle-sentiment-analysis-with-social-media-text.

htmlhttp://datameetsmedia.

com/vader-sentiment-analysis-explained/nltk.

download(‘vader_lexicon’)from nltk.

sentiment.

vader import SentimentIntensityAnalyzersid = SentimentIntensityAnalyzer()#showing the sentiment scores for each tweetfor tweet in df[‘text’]: print(tweet) ss = sid.

polarity_scores(tweet) for k in sorted(ss): print(‘{0}: {1}, ‘.

format(k, ss[k]), end=’’) print(“!.”)The resulting output looks like this.

It’s really cool to see this output.

The VADER sentiment analyzer outputs four scores:neg: Negativeneu: Neutralpos: Positivecompound: Compound (i.

e.

aggregated score)The neg, neu, and pos scores return a float for sentiment strength based on the input text.

The VADER sentiment analysis also returns a compound sentiment score for each individual tweet in the range -1 to 1, from most negative to most positive.

By looking at the compound scores, we can classify each tweet as “positive”, “negative”, or “neutral” (> 0.

0, < 0.

0, and == 0.

0 respectively).

We can see the overall distribution of scores with the following code.

summary = {“positive”:0,”neutral”:0,”negative”:0}for tweet in df[‘text’]: ss = sid.

polarity_scores(tweet) if ss[“compound”] == 0.

0: summary[“neutral”] +=1 elif ss[“compound”] > 0.

0: summary[“positive”] +=1 else: summary[“negative”] +=1import matplotlib.

pyplot as pyplotkeys = summary.

keys()values = summary.

values()#add colorscolors = [‘#99ff99’, ‘#66b3ff’,’#ff9999']pyplot.

axis(“equal”) # Equal aspect ratio ensures that pie is drawn as a circlepyplot.

pie(values,labels=keys,colors=colors, autopct=’%1.

1f%%’, shadow=True, startangle=90)pyplot.

show()Positive: 788; Neutral: 413; Negative: 287By looking at the total distribution of the compound scores of all tweets, we can see that overall, over 50% of the tweets are positive, with 28% neutral and 19% negative.

Although the VADER package is a powerful package, it was not perfect.

There are still classification mistakes when you look closely.

I believe this is because of the training dictionaries used in many sentiment analyzers to track positivity/negativity and the continued subtleties of the English language.

VADER has many useful approaches for reviews.

It can successfully interpret intensity of a positive sentiment, such as when “excellent” is treated as more positive than just “good”.

However, these tweets are not explicitly “reviews” and positive tweets may not use such words.

For example:Misspelling of “MAJOR” and use of slang such as “props” means the excited approval in this tweet was lost on our analyzer.

Or in this case…Sarcasm unfortunately not detected!In conclusion…Overall, this project showed that the #savetheturtles movement is still going strong?.The overall amount of awareness and tweets has increased over time, with an overwhelming majority of positive tweets.

Admittedly, there are a couple of caveats.

Scraping the #savetheturtles hashtag implies that most people are retweeting/tweeting to show support for sustainability; it doesn’t make sense that someone hating the straw movement would use this hashtag.

In retrospect, it might’ve been better practice to combine tweets of several hashtags or look at occurrence of the hashtag on the main twitter page.

More to try for next time!Perhaps in the future, I could also implement the retweets and likes columns into the analysis as a measure of positive support.

I would greatly appreciate any input or advice into how to do so, or other suggestions you may have!Liked this article.Please leave a comment and let me know what you think about straws, text analysis, and otherwise!.

. More details

Leave a Reply