Basic NLP on the Texts of Harry Potter: Sentiment Analysis

The legend has been reversed in this plot, which isn’t really necessary for readability or anything but I did it for consistency with the area and column charts coming up next.Here’s how I made it:# use the Tableau color scheme of 10 colorstab10 = matplotlib.cm.get_cmap('tab10')length = sum([len(hp[book]) for book in hp])window = 20# use index slicing to remove data points outside the windowx = np.linspace(0, length – 1, num=length)[int(window / 2): -int(window / 2)]fig = plt.figure(figsize=(15, 15))ax =fig.add_subplot(1, 1, 1)# Loop over the emotions with enumerate in order to track colorsfor c, emotion in enumerate(emotions): y = movingaverage([hp_df.loc[book].loc[hp[book][chapter][0]][emotion] for book in hp for chapter in hp[book]], window)[int(window / 2): -int(window / 2)] plt.plot(x, y, linewidth=5, label=emotion, color=(tab10(c))) # Plot vertical lines marking the booksfor book in book_indices: plt.axvline(x=book_indices[book][0], color='black', linewidth=2, linestyle=':')plt.axvline(x=book_indices[book][1], color='black', linewidth=2, linestyle=':')plt.legend(loc='best', fontsize=15, bbox_to_anchor=(1.2, 1))plt.title('Emotional Sentiment of the Harry Potter series', fontsize=20)plt.ylabel('Relative Sentiment', fontsize=15)# Use the book titles for X ticks, rotate them, center the left edgeplt.xticks([(book_indices[book][0] + book_indices[book][1]) / 2 for book in book_indices], list(hp), rotation=-30, fontsize=15, ha='left')plt.yticks([])# Reverse the order of the legendhandles, labels = ax.get_legend_handles_labels()ax.legend(handles[::-1], labels[::-1], loc='best', fontsize=15, bbox_to_anchor=(1.2, 1))ax.grid(False)plt.show()I also made an area plot to show the overall emotive qualities of each chapter..This is again a moving average in order to smooth out the more extreme spikes and to show the story arc better across all books:The books seem to start with a bit of trailing emotion from the previous story but quickly calm down during the middle chapters only to pick back up again at the end.length = sum([len(hp[book]) for book in hp])window = 10x = np.linspace(0, length – 1, num=length)[int(window / 2): -int(window / 2)]fig = plt.figure(figsize=(15, 15))ax = fig.add_subplot(1, 1, 1)y = [movingaverage(hp_df[emotion].tolist(), window)[int(window / 2): -int(window / 2)] for emotion in emotions]plt.stackplot(x, y, colors=(tab10(0), tab10(.1), tab10(.2), tab10(.3), tab10(.4), tab10(.5), tab10(.6), tab10(.7), tab10(.8), tab10(.9)), labels=emotions)# Plot vertical lines marking the booksfor book in book_indices: plt.axvline(x=book_indices[book][0], color='black', linewidth=3, linestyle=':')plt.axvline(x=book_indices[book][1], color='black', linewidth=3, linestyle=':')plt.title('Emotional Sentiment of the Harry Potter series', fontsize=20)plt.xticks([(book_indices[book][0] + book_indices[book][1]) / 2 for book in book_indices], list(hp), rotation=-30, fontsize=15, ha='left')plt.yticks([])plt.ylabel('Relative Sentiment', fontsize=15)# Reverse the legendhandles, labels = ax.get_legend_handles_labels()ax.legend(handles[::-1], labels[::-1], loc='best', fontsize=15, bbox_to_anchor=(1.2, 1))ax.grid(False)plt.show()Note how in this chart, reversing the legend became necessary for readability..By default, the legend items are added in alphabetical order going down, but the data is stacked from the bottom up..So the colors of the legend and the area plot run in opposite direction — to my eye, quite confusing and difficult to follow..So with ‘Anger’ plotted at the bottom, I also wanted it to be on the bottom of the legend and likewise with ‘Trust’ at the top.And lastly, a stacked bar chart to show the weights of the various sentiments across the books:Naturally, words associated with any of the positive emotions would also be associated with the ‘Positive’ sentiment, and likewise for ‘Negative’, so it shouldn’t come as a surprise that those two sentiments carry the bulk of the emotive quality of the books..I find it notable that the emotions are relatively consistent from book to book with just slight differences in magnitude but consistent weights, except for the ‘Fear’ emotion in red; it seems to exhibit the most variance across the series..I also would have expected the cumulative magnitude of sentiments to increase throughout the series as the stakes became higher and higher; however although the final book is indeed the highest, the other 6 books don’t show this gradual increase but almost the opposite, with a constant decline starting with book 2.books = list(hp)margin_bottom = np.zeros(len(books))fig = plt.figure(figsize=(15, 15))ax = fig.add_subplot(1, 1, 1)for c, emotion in enumerate(emotions): y = np.array(hp_df2[emotion]) plt.bar(books, y, bottom=margin_bottom, label=emotion, color=(tab10(c))) margin_bottom += y# Reverse the legendhandles, labels = ax.get_legend_handles_labels()ax.legend(handles[::-1], labels[::-1], loc='best', fontsize=15, bbox_to_anchor=(1.2, 1))plt.title('Emotional Sentiment of the Harry Potter series', fontsize=20)plt.xticks(books, books, rotation=-30, ha='left', fontsize=15)plt.ylabel('Relative Sentiment Score', fontsize=15)plt.yticks([])ax.grid(False)plt.show()The tricky bit in this plot is using the margin_bottom variable to stack each of the columns..Other than that, it just uses a couple tricks from the previous plots.. More details

Leave a Reply