The Sentiment of the Union

The Sentiment of the UnionAnalyzing Presidential State of the Union Addresses using Sentiment Analysis and Python toolsDaniel BashirBlockedUnblockFollowFollowingJan 20Photo from 271277 on PixabayIn Article II, Section 3 of the Constitution, the President of the United States is directed to “give to the Congress information of the State of the Union, and recommend their consideration such measures as he shall judge necessary and expedient.

”With the news surrounding Trump’s 2019 State of the Union address, it might be interesting to look at these addresses over time and see if we can note any interesting trends and changes since George Washington gave the first one.

In this article, we’ll adopt a data-driven approach in Python and leverage tools like sentiment analysis to better understand the development of these speeches over time.

In a series of python notebooks, I’ve looked at sentiment values of the different speeches, performed topic modeling, created WordClouds, and finally constructed a rudimentary measure of “how much is being said” by looking at the entropy values of each speech.

Sentiment Analysis is a popular tool in Natural Language Processing that helps form a better understanding and analysis of a text by understanding the opinions expressed by it.

Since we’re considering a few different methods here, we won’t dive too deep.

First, we’ll consider a simple representation of each speech’s sentiment (“positive” or “negative”) denoted by a single number, then see if topic modeling can help us derive any further insights.

Before diving in, let’s consider what we might expect from the addresses we’re about to analyze.

When Washington gave the first State of the Union Address in 1790, the scrutiny on him as the first POTUS caused him to take a cautious and deferential tone, making recommendations as opposed to the calls for action that we would have seen from George W Bush after 9/11.

As a result, we might expect the sentiment value of Washington’s addresses to be relatively “neutral” (without assigning a numerical value to it).

That’s enough speculation.

Let’s look at our results.

I’ve graphed the sentiment values produced by the first notebook below:You’ll notice a spike near the right side of the graph that sticks out like a sore thumb — that’s the 1981 State of the Union Address given by President Jimmy Carter.

If we were to ignore this piece of data, the sentiment values of the rest of the addresses would appear to be much more closely related.

Regardless, we’ll take a look at this particular address when we begin to look at changes in sentiment over individual presidencies.

Let’s push a little bit further — since we know the sentiment values for speeches given by each individual president, we can look at the change in sentiment from their first to their last State of the Union address.

Perhaps this will have some correlation with the success of their presidency.

Running the numbers, we have the following observations:Largest increase is Carter with 106.


This is somewhat of a surprise and we’ll consider it below.

Next highest lags this by quite a bit: McKinley with 33.


While the beginning of his presidency involved the Cuba crisis and a war with Spain, McKinley managed to achieve peace with Spain and some territorial gain.

The end of his presidency was an optimistic one before his assassination, so the value seems to align with historical events.

Largest decrease is Truman with -35.


Truman assumed office in the wake of an Allied victory in WWII, and led a first term involving doctrines such as the Marshall Plan and uncertainty about China.

The Korean war during Truman’s second term was a frustrating stalemate for the US and by 1952 Truman achieved the all-time lowest approval rating for an active US president.

Again, the data seems to align with history.

Roosevelt is closely following Truman with -31.


We might recall a contradictory beginning, as Roosevelt won in a landslide victory over Truman, but assumed the presidency in the midst of the Great Depression and went on to serve a record four terms.

He had a prolific beginning, spearheading major legislation and instituting the New Deal.

Interestingly, the 1934 address did not “present to [Americans] a picture of complete optimism regarding world affairs.

” While Roosevelt painted a more hopeful and optimistic picture of the end of the Second World War as a great year of achievement and the ending of Nazi-Fascist reign in Europe, his lengthy speech touched on “considerable losses”, “desperate attempts” by the enemy, and “evil and baseless rumors” that amounted to “divisive propaganda” by Germany.

This might be the most interesting result we’re considering, because while the 1945 speech was certainly positive and hopeful in many ways, our model likely picked up on the plethora of unsavory and negative words used to describe the enemy during WWII.

Trump is fairly high in the negatives with a change of -8.

769999999999989 by the second year of his term.

It’s too early to draw a full trajectory and explain everything, but the continuing trend of accusations and evidence regarding the Russia Investigation suggest that this trend of negative sentiment might continue into the 2019 State of the Union Address, if it indeed happens.

Beginning his presidency during a year of stagflation, it might make sense that if Carter succeeded in curbing the phenomenon then there would be a marked increase in the sentiment expressed in his addresses.

Interestingly, Carter’s last fifteen months as president were marred by crises including the Iran hostage crisis and the Soviet invasion of Afghanistan, while Carter himself is usually evaluated as a below-average president.

On a similarly interesting note, despite the agreement that Polk was a generally successful (although often overlooked president), the change in sentiment for his speeches was roughly 7.

At the same time, it wouldn’t be a stretch to imagine that a president such as Polk would adopt a measured tone in giving speeches.

At the other end, our sentiment model might be picking up on positive and hopeful statements from Carter’s 1981 speech such as in the paragraph below:“ However, I firmly believe that, as a result of the progress made in so many domestic and international areas over the past four years, our Nation is stronger, wealthier, more compassionate and freer than it was four years ago.

I am proud of that fact.

And I believe the Congress should be proud as well, for so much of what has been accomplished over the past four years has been due to the hard work, insights and cooperation of Congress.

I applaud the Congress for its efforts and its achievements.

”The language itself isn’t anomalous, but the overall length of Carter’s address, combined with the decidedly hopeful and positive language, would certainly have contributed to the score our model gave the speech.

While Carter did not avoid mentioning difficulties, the intention of his speech was to paint a portrait of four years of progress.

It would be natural that Carter wanted to sum up what he saw as the defining achievements of his administration and set a positive note for the future.

This resulted in a more long-winded and hopeful speech than usual.

In the second notebook, I performed topic modeling, a type of statistical modeling that helps us discover the abstract topics that appear in documents and texts.

I used two models, Latent Dirichlet Allocation (LDA) and Latent Semantic Indexing (LSI) on a subset of the speeches, holding out the last two addresses given by Trump.

For those looking for technical explanations of the two models, you can find a nice explanation of LDA and LSI in Edward Ma’s article.

Here, I’ll primarily focus on what they can tell us about the speeches we’re interested in.

The LDA model gave us the following topic clusters (the output of the LSI model can be found in notebook 2):Topics produced by the LDA ModelIf we go past the words that are pretty clearly ubiquitous across all topics (“united”, “states”, “congress”, “year”), then what the topics can do is give us some insight into the type of language used during the speeches.

Unfortunately, it’s worth noting that the model doesn’t give us much in the way of topics that integrate policy-specific language like “agriculture” or “relations”.

On the other hand, words like “must”, “great”, and “people” can tell us something more about the tone of the speeches, and how they have historically been used both as updates and as calls to action, drawing on Americans’ strong sense of pride for their country.

In the next part of our exploration, we’ll try to construct a (very) rudimentary measure of “how much the president is saying” by calculating the entropy of each address individually.

In information theory, entropy is defined to be the average rate at which information is produced by a stochastic (random) source of data.

In this context, the entropy expresses our expectation of the information content of a speech, or how much uncertainty it resolves.

For our purposes, we can think of the entropy measure as how concise the text is.

The higher the entropy of a speech, the less concise it is and therefore the more it is “saying”.

We’ll use this thinking as a starting point and go from there.

Below is a graph of the entropy of each speech, beginning from Washington’s and ending at Trump’s:Entropy of State of the Union AddressesWithout going too deep, it’s worth noting that on average, the entropy values seem to increase as time goes on with a fair number of dips.

However, the values travel down significantly enough that we would probably be amiss saying that entropy has increased significantly from earlier to later speeches.

One characteristic of higher-entropy speeches was longer length, but some further exploration in this notebook shows that this doesn’t explain the whole picture.

If you’re interested in seeing some further analysis and perhaps taking this forward yourself, please check out the notebook entitled “Entropy”.

In the final step of exploration, I generated a few WordClouds to visualize the following:A compilation of all the SOTU addressesCompare the10 earliest vs the 10 latest addressesCompare Obama’s addresses and Trump’s two addressesWordCloud of compilation of SOTU addressesIf we consider the WordCloud of all addresses, we shouldn’t be too surprised that the most prominent words are the most general and widely applicable to the American people: “american”, “america”, “nation”, “new” and “people” are featured.

Words we saw in our topic modeling such as must” and “will” also make an appearance.

At the same time, while they are given less precedence, topics that have both prompted lots of debate in recent years that also have historical roots also show up — these include “drug”, “immigration”, and “terrorist”.

Next, we’ll look at the ten oldest and ten newest addresses.

Comparing the oldest to the newest addresses, there are some notable changes.

In the WordCloud for the newest speeches, “united states” is dropped almost entirely, while the words “must”, “great”, and “country” make a far larger appearance than they did previously.

The words “interest” and “state” also disappear, indicating some interesting rhetorical shifts in the speeches of early to contemporary presidents.

If we remember from earlier the precarious nature of the presidency in its first few years, it makes sense that calls to action would be much more the norm in today’s world than they ever were at that time — further, contemporary discourse has sparked a fair amount of rhetoric drawing on Americans’ sense of pride, explaining the word “great”.

The disappearance of words like “interest” and “state” might mark a transition away from more theoretical discourse about the function of a nation-state that the Founding Fathers considered so deeply.

Finally, let’s compare WordClouds of the State of the Union Addresses given by our two most recent presidents: Barack Obama and Donald Trump.

Of note are the larger presence in Obama’s cloud of the words “us”, “make”, “new”, “help”, “job”, and “every”.

Trump’s cloud features the words “america” and “nation”, as well as “people” and “country” with greater emphasis.

We can also spot the word “drug” with some ease, although we can’t in Obama’s cloud.

While some of the largest words are generic, the differences between the two clouds and the rhetorical suggestions they imply indicate differences in Obama’s and Trump’s interests and policies.

Obama’s streak as a proponent for positive change and a politician who often used speeches (as he did in his campaign) to engender a sense of community among his listeners explains the presence of words like “us” and “every”, while his domestic healthcare policy and focus on that might explain words like “help”.

The words we noted in Trump’s clouds are reflective of the considerable doses of nationalism present in his speeches.

The dominating size of the word “will” in both clouds suggests promises from both presidents.

As a recap, we’ve analyzed the texts of the State of the Union Addresses given by presidents since George Washington, first using a basic sentiment measure to look at the positivity and negativity in speeches and the shifts in these values for each president.

We then considered topic modeling to see if we could draw inferences about topics of interest in State of the Union Addresses.

Next, we considered entropy as a measure of compressibility and a rudimentary measure of the “content” of a speech.

Finally, we used WordClouds to look at topics germane to all presidents as expressed in these addresses, and at differences between a few interesting groups and individuals.

Tools like NLP can help us analyze interesting trends in and develop new insights about historical documents, from speeches to books.

The presidential State of the Union addresses are important in the sense that they can tell us about the state of current events throughout American history and paint portraits of optimism and pessimism over time.

While helpful, a purely technological analysis of the situation can lead us to potentially misleading conclusions as it did with Roosevelt if we don’t take the time to examine the documents ourselves.

As we go forward, we should continue to use tools such as sentiment analysis for the powerful insights they can provide, but remember that when analyzing text and speeches in particular, it’s necessary to take into account numerous contextual and rhetorical factors that models often can’t understand as well as humans can.

Sources:[1] G.

Washington, First Annual Address to Congress (1790).

[2] J.

Carter, State of the Union Address (1981).

[3] F.

Roosevelt, State of the Union Address (1934).

[4] F.

Roosevelt, State of the Union Address (1945).

[5] William McKinley, Wikipedia.

[6] Harry S.

Truman, Wikipedia.

[7] Presidency of Franklin D.

Roosevelt, Wikipedia.

[8] E.

Ma, 2 latent methods for dimension reduction and topic modeling (2018), Towards Data Science.

.. More details

Leave a Reply