Interactive Data Visualization with Python Using Bokeh

Correct, by importing necessary packages and by importing data itself (very important :D).

Then we perform some EDA (exploratory data analysis) to understand what we are dealing with and after that cleaning and transforming data into format necessary for analysis.

Pretty straightforward.

As article doesn’t focus on these steps I will just insert the code below with all the transformations I have made.

import pandas as pdimport numpy as np# Data cleaning and preparationdata = pd.

read_csv('data/co2_emissions_tonnes_per_person.

csv')data.

head()gapminder = pd.

read_csv('data/gapminder_tidy.

csv')gapminder.

head()df = gapminder[['Country', 'region']].

drop_duplicates()data_with_regions = pd.

merge(data, df, left_on='country', right_on='Country', how='inner')data_with_regions = data_with_regions.

drop('Country', axis='columns')data_with_regions.

head()new_df = pd.

melt(data_with_regions, id_vars=['country', 'region'])new_df.

head()columns = ['country', 'region', 'year', 'co2']new_df.

columns = columnsupd_new_df = new_df[new_df['year'].

astype('int64') > 1963]upd_new_df.

info()upd_new_df = upd_new_df.

sort_values(by=['country', 'year'])upd_new_df['year'] = upd_new_df['year'].

astype('int64')df_gdp = gapminder[['Country', 'Year', 'gdp']]df_gdp.

columns = ['country', 'year', 'gdp']df_gdp.

info()final_df = pd.

merge(upd_new_df, df_gdp, on=['country', 'year'], how='left')final_df = final_df.

dropna()final_df.

head()np_co2 = np.

array(final_df['co2'])np_gdp = np.

array(final_df['gdp'])np.

corrcoef(np_co2, np_gdp)By the way, CO2 emissions and GDP correlate, and quite significantly — 0.

78.

np.

corrcoef(np_co2, np_gdp) Out[138]: array([[1.

, 0.

78219731], [0.

78219731, 1.

]])And now let’s get to the visualization part.

Again, we start with necessary imports.

I will explain all of them further.

Now, just relax and import.

from bokeh.

io import curdocfrom bokeh.

plotting import figurefrom bokeh.

models import HoverTool, ColumnDataSource, CategoricalColorMapper, Sliderfrom bokeh.

palettes import Spectral6from bokeh.

layouts import widgetbox, rowWe will start with a preparations of different details for our interactive visualization app.

First, we create a color mapper for different regions of the world, so every country will have different color depends on the region it is situated in.

We select unique regions and convert them to a list.

Then we use CategoricalColorMapper to assign different color for each region.

regions_list = final_df.

region.

unique().

tolist()color_mapper = CategoricalColorMapper(factors=regions_list, palette=Spectral6)Next, we will prepare a data source for our application.

Bokeh accepts a lot of different types of data as the source for graphs and visuals: providing data directly using lists of values, pandas dataframes and series, numpy arrays and so on.

But the core of most Bokeh plots is ColumnDataSource.

At the most basic level, a ColumnDataSource is simply a mapping between column names and lists of data.

The ColumnDataSource takes a data parameter which is a dictionary, with string column names as keys and lists (or arrays) of data values as values.

If one positional argument is passed in to the ColumnDataSource initializer, it will be taken as data.

(from official website).

# Make the ColumnDataSource: sourcesource = ColumnDataSource(data={ 'x': final_df.

gdp[final_df['year'] == 1964], 'y': final_df.

co2[final_df['year'] == 1964], 'country': final_df.

country[final_df['year'] == 1964], 'region': final_df.

region[final_df['year'] == 1964],})We start with a sample of our data only for one year.

We basically create a dictionary of values for x, y, country and region.

Next step is to set up limits for our axes.

We can do that by finding minimum and maximum values for ‘X’ and ‘Y’.

# Save the minimum and maximum values of the gdp column: xmin, xmaxxmin, xmax = min(final_df.

gdp), max(final_df.

gdp)# Save the minimum and maximum values of the co2 column: ymin, ymaxymin, ymax = min(final_df.

co2), max(final_df.

co2)After that we create our figure, where we will place all our visualization objects.

We give it a title, set width and height and also we set the axes.

(‘Y’ axis is set to log type just for better view — few types were tried and this one gave the best result)# Create the figure: plotplot = figure(title='Gapminder Data for 1964', plot_height=600, plot_width=1000, x_range=(xmin, xmax), y_range=(ymin, ymax), y_axis_type='log')Bokeh uses a definition of glyph for all the visual shapes that can appear on the plot.

The full list of glyphs built into Bokeh is given below (not inventing anything — all info from official page):AnnularWedgeAnnulusArcBezierEllipseHBarHexTileImageImageRGBAImageURLLineMultiLineMultiPolygonsOvalPatchPatchesQuadQuadraticRayRectSegmentStepTextVBarWedgeAll these glyphs share a minimal common interface through their base class GlyphWe won’t go too deep with all these shapes and will use circles as one of the most basic ones.

If you would like to play more with other glyphs you have all the necessary documentation and links.

# Add circle glyphs to the plotplot.

circle(x='x', y='y', fill_alpha=0.

8, source=source, legend='region', color=dict(field='region', transform=color_mapper), size=7)So how do we add these circles?.We assign our source to the “source” parameter of the circle glyph, we specify data for ‘X’ and ‘Y’, we add legend for colors and we apply previously created ColorMapper to the “color” parameter, “fill_alpha” sets a little of transparency and “size” is the size of the circles that will appear on the plot.

Next we improve the appearance of our plot by setting up the location of the legend and giving some explanations to our axes.

# Set the legend.

location attribute of the plotplot.

legend.

location = 'bottom_right'# Set the x-axis labelplot.

xaxis.

axis_label = 'Income per person (Gross domestic product per person adjusted for differences in purchasing power in international dollars, fixed 2011 prices, PPP based on 2011 ICP)'# Set the y-axis labelplot.

yaxis.

axis_label = 'CO2 emissions (tonnes per person)'As of now we have a basic and static plot for the year 1964, but the title of the article has a word that doesn’t fit with this situation — “Interactive” O_O.

So let’s add some interactivity!To do that we will add a slider with years, so in the end we will have a visualization for every available year.

Cool!.isn’t it?Previously we imported class Slider, now it’s time to use it!.So we create the object of this class with start being the minimum year, end – maximum, default value – minimal year year again, step (how fast the values are changing on the slider) – 1 year and the title.

Also we create a callback for any change that happens on this slider.

Callbacks in Bokeh always have the same input parameters: attr, old, new.

We are going to update our datasource based on the value of a slider.

So we create a new dictionary that will correspond to the year from the slider and based on this we update our plot.

Also we update the title accordingly.

# Make a slider object: sliderslider = Slider(start=min(final_df.

year), end=max(final_df.

year), step=1, value=min(final_df.

year), title='Year')def update_plot(attr, old, new): # set the `yr` name to `slider.

value` and `source.

data = new_data` yr = slider.

value new_data = { 'x': final_df.

gdp[final_df['year'] == yr], 'y': final_df.

co2[final_df['year'] == yr], 'country': final_df.

country[final_df['year'] == yr], 'region': final_df.

region[final_df['year'] == yr], } source.

data = new_data # Add title to figure: plot.

title.

text plot.

title.

text = 'Gapminder data for %d' % yr# Attach the callback to the 'value' property of sliderslider.

on_change('value', update_plot)With this amount of data points the plot becomes messy very quickly.

So to add more clarity to every little circle that will be presented here, I decided to also include HoverTool into this figure.

# Create a HoverTool: hoverhover = HoverTool(tooltips=[('Country', '@country'), ('GDP', '@x'), ('CO2 emission', '@y')])# Add the HoverTool to the plotplot.

add_tools(hover)HoverTool accepts a list of tuples with first value being label and the second — being value detail from the datasource.

We have done with all the components of our little app, just few final lines of code to create a layout and add it to the current document# Make a row layout of widgetbox(slider) and plot and add it to the current documentlayout = row(widgetbox(slider), plot)curdoc().

add_root(layout)And we are done!.Congratulations!.We run this code and… Nothing.

No errors (or maybe some errors, but then you fix them and there is no errors) and no app, no visualization O_o.

Why the hell did I spend all that time to create a cool plot and I get nothing?.Not even explanation what I did wrong?That were my first thoughts when I tried to run the app.

But then I remembered a trick that you actually first have to start a server that will be a backend for this visualization.

So the next and the last thing you have to do is to run the code below from your command line:bokeh serve –show my_python_file.

pyAnd it will automatically open your visualization in a new browser tab.

Despite being the most popular, matplotlib is not the most user-friendly data visualization tool and has it’s own limitations, aaaaand I don’t really like it.

So Bokeh is one possible solution if you belong to that same cohort of people as I do.

Try it and let me know your thoughts.

Thank you for your attention, hope this little introduction to Bokeh is useful and have a great day!.(or night if you are reading this before going to bed :D)P.

S.

Want to try plotly as well, saw a lot of positive feedback about it.

P.

S.

S.

Code on Github.

Originally published in 4 languages (EN, ES, UA, RU) at sergilehkyi.

com on January 31, 2019.

.

. More details

Leave a Reply