Visualizing Recessions with Pandas and Three.js

Visualizing Recessions with Pandas and Three.jsJust a gif. Scroll down to the CodePen for the interactive version.How have U.S. economic recessions compared in severity? And how can severity be measured? These were the questions I was considering after identifying recessions for the final project of Introduction to Data Science in Python with Christopher Brooks. The following article outlines some key lessons learned in the process of analyzing the data in pandas and visualizing it with three.js. I will focus on:Using vector computations to cut down on opaque for loopsUsing feature scaling to better understand differences between data points that are otherwise hard to distinguishThe challenges of visualizing data in three dimensions and some unresolved questionsBut before you have to read too much, here is the visualization so you can see for yourself. Drag around to change perspective. Click on the spheres to show their tooltips. They are draggable too since the chart can easily become cluttered.Vectors for SimplicityThe first step was to identify the recessions between 1947 and 2016. Naturally, I said to myself, “Let’s start with a for loop and see how far I get.” But this got out of hand. Fast.Since finding recessions depends on quarterly differences, I thought why not work with the differences directly? This was where vectors came in handy. First, I started with a vector that began with a quarter’s GDP and then added the GDPs from the next several quarters. Here’s what Q3 of 1981 looked like:Next, I created a vector of the quarter over quarter differences between the GDPs from the first vector:Now we’re in business! Iterating over the differences became more straightforward, which was particularly useful when handling recessions that were not merely two quarterly declines followed by two quarterly increases. The recession that began in Q3 of 1981, as shown above, is an example of one of those roller coasters.Feature ScalingVisualizing the date in three dimensions with three.js seemed promising, but as soon as the data was loaded in I hit a snag. Each dimension had values over different ranges and this made it difficult to make comparisons. To use an admittedly rough analogy, how could I hold up a magnifying glass to each dimension and illuminate the microcosm of interactions while not obscuring the view of the other magnifying glasses that were needed?For each feature, I needed to stretch the values over a range that would be easier to comprehend. Here is an example:One measure of recession severity was how far the fall was from start to bottom. Since I was less interested in the raw values themselves as much as each value’s place in the set, I scaled them out on a range between 0 and 100. And, to my luck, scikit-learn had a tool for that. This ended up being a double whammy in that it helped both in analysis and because the same scaled range (0 to 100) could be used for multiple dimensions, providing a consistent method of feeding data into three.js.It begs the following question, though: is it even helpful to scale data like this? On one hand, I lose context. I have no idea what the original values are or over what range they sit. The raw data from the table above has values spanning a range of 4%, which is way different than 40% or 400%. So this approach could certainly be used to obfuscate the larger picture and distort the salience of a particular relationship. On the other hand, I do find it helpful to zoom in and highlight the various comparisons. In retrospect, the visualization above would probably be better suited as a companion alongside more straightforward displays of each metric. But, live and learn.Many Priorities, One UISwitching gears to the UI, I found that using a three dimensional tool provided some competing priorities, particularly with the tooltips. Here is what I tried to accomplish:Pan the camera around to see the data points and other chart elements from different angles.Identify each data point with its year and quarter.Connect a tooltip with its data point so that your eye does not have to look at a disconnected list somewhere else on the page.Keep the UI uncluttered while still providing useful insights.Use color to pair elements and communicate priority.The visualization that I created only had to accommodate 10 data points, and I feel like my approach just barely works if all 10 tooltips are open. So what could be done if the chart contained several hundred data points? How would it be possible to see what year each one was from? I don’t have answers to these questions, but my guess is that there can be no one visualization to rule them all. The multiple perspectives needed to truly shed light on even just 10 data points may require a variety of charts to get the job done.ConclusionThanks for reading and feel free to take a look at the repo if you would like to see the code for yourself or get more details on my methodology.

Leave a Reply