A Beginner’s Guide to Data Visualization Using Matplotlib

All you have to do is call plt.

plot() twice with the two different series that you’d like to pass as the parameters for the x-value, as shown here:gdp = df['GDP_Per_Capita']lifeExp = df['Healthy_Life_Expectancy']plt.

plot(rank, gdp)plt.

plot(rank, lifeExp)plt.

show()This code outputs the following visual:Like the first graph we made, we don’t really know what this graph is telling us.

Additionally, we don’t know which line represents which x-parameter we passed in.

There are two possible ways to handle this issue.

The first will add a legend to tell us what color line represents which variable.

# Option 1plt.

plot(rank, gdp)plt.

plot(rank, lifeExp)plt.

title('World Rank vs GDP and Life Expectancy')plt.

xlabel('Country Rank')plt.

legend()plt.

show()# Option 2 plt.

plot(rank, gdp, color = 'green')plt.

plot(rank, lifeExp, color = 'blue')plt.

title('World Rank vs GDP and Life Expectancy')plt.

xlabel('Country Rank')plt.

legend()plt.

show()Note: These options should be ran separately from each other in order to get both outputs.

Now we now which color line represents which variable.

Whether or not you choose to set the colors for each variable, it is almost always a good idea to include a legend on your visualization so that you can quickly identify which line represents which variable.

From this graph, we can also visually identify a trend.

Both GDP per capita and life expectancy have lower values that contribute to the overall happiness score of a country as we fall into lower country ranks.

Additionally, we can see that, a lot of the time, when there is a spike for GDP per capita for a country, there is also a spike for life expectancy for the same country.

The same can also be said for dips in GDP per capita and life expectancy.

Scatter PlotsScatterplots are a great way to visualize a relationship between two variables without the potential for a crazy trend line that we might get from a line graph.

Just like a line graph, creating a scatterplot in Matplotlib only requires a few lines of code, as shown here.

plt.

scatter(gdp, score)plt.

title('GDP vs Happiness Score')plt.

xlabel('GDP per Capita')plt.

ylabel('Happiness Score')plt.

show()Without the addition of a title and axes labels, it would only take two lines of code to create a scatterplot.

This code creates the following scatterplot:As we would expect, the higher the score for GDP per Capita, the higher the happiness score of a certain country.

However, there is a small problem with this graph.

By convention, graph axes should always start from 0, with a few exceptions.

As we can see here, this graph has the lowest y-tick as 3, which is misleading.

Luckily, this is an easy fix.

All we have to do is add the line plt.

ylim(0, 8) right before we call plt.

show(), and this problem will be fixed, as shown here:This graph gives us a slightly different understanding than the first scatterplot we constructed.

We can have a happiness score of 3 with a GDP per capita score of 0.

If you don’t pay attention to the y-axis on the first graph, you would probably be led to believe that a GDP per capita score close to 0 meant a happiness score close to 0, which is simply not the case.

This tells us that there are other factors that affect a country’s happiness score, and they should also be investigated.

Scatterplots can be helpful in identifying linear relationships that are present in data.

However, there is not an easy way to add a regression line over a scatterplot in Matplotlib.

My next article will tell you how to easily do this using Seaborn.

HistogramsA histogram shows the distribution of a particular feature of the data.

Put more simply, it shows us how many observations take a certain value.

Just like a line graphs and scatterplots, basic histograms are very easy to create.

plt.

hist(score)plt.

title('Happiness Score Distribution')plt.

xlabel('Happiness Score')plt.

ylabel('Frequency')plt.

show()This histogram was created in five simple lines of code.

It tells us how many countries have each happiness score.

Because the happiness score takes a continuous range of values, this we can’t get exact data figures just by looking at it, but we can get a general idea.

For example, there are about 15 countries that have a happiness score between 3 and 4, and there are the highest number of countries (about 25) that have a happiness score around 4.

5.

Said another way, the most common happiness score is a value around 4.

5.

Bar GraphsConstructing bar graphs in Matplotlib is a little bit more difficult than you would think.

It can be done in a few lines of code, but it is important to understand what this code is doing.

roundedHappinessScore = score.

apply(int)count = roundedHappinessScore.

value_counts()hapScore = count.

indexplt.

bar(hapScore, count)plt.

title('Happiness Scores')plt.

xlabel('Score')plt.

ylabel('Count')plt.

show()The last five lines of code are pretty self-explanatory, but what’s happening in the first three lines?.The first line converts all of the happiness scores to an integer, that way there are only a few discrete values that the happiness score can take.

The second line gets the number of times each score occurs.

This count will be used as the height for our bar graph.

The third line, then, gets the score associated with each count, which is needed as the x-axis of our graph.

When run, this code produces the following bar graph:This graph gives us a slightly different story than the histogram we created above.

It is much easier to interpret, and we can see here that there are the most observations that have a rounded happiness score of 5.

Because we “rounded” using the int() function, this means that a score of 5 can be any value in the range 5 ≤ x < 6.

ConclusionAs you can see, Matplotlib can be a great way to create simple visualizations pretty quickly.

Most graphics only take a few line of code to create, and can be aesthetically modified to make them even better.

For more information on Matplotlib, check out the API here.

All code used in this article can be found in my Github.

This article is the first in a short series on creating data visualizations, so stay tuned for tutorials on creating visualizations using Seaborn and Plotly.

.

. More details

Leave a Reply