Interactive Histograms with Bokeh

My intention is to provide instructions for building a functional Python class that can be expanded and customized based on individual need.

First we will write three plotting functions which will help us to get familiar with Bokeh syntax and conventions.

Then we will wrap these three functions is a class to make creating various plots quicker and easier.

As with any Python script we begin with the necessary imports:Function One: Hist HoverThis is a function for creating a single histogram with cursor-hover interactivity.

When the user hovers of the bars of the histogram the upper and lower bounds, as well as the count, of the bar are displayed.

The color is also changed to highlight that particular bar.

The first step is to create the data for the histogram with the Numpy function np.

histogram.

Here we pass is the source dataframe, the column we want to plot, and the number of bins:I’m leaving the arguments generic here because we are going to build this out into a function which will take these argument when it is called.

We then turn this histogram data into a dataframe:The first part here creates a dataframe from the histogram data with the target variable counts, and left and rights edges of the bins.

The second part creates a new column describing the bins interval.

This will be useful when we call the HoverTool in Bokeh.

Next we create a ColumnDataSource object in Bokeh.

This is the object that Bokeh uses for much of its plotting capabilities.

src = ColumnDataSource(hist_df)Then we actually create the plot with two separate calls:This first call creates a Bokeh figure object where we specify the size (this will later be an attribute of the class), title, and labels.

The second call creates a “glyph” on top of the figure.

This is Bokeh’s way of actually putting data on the canvas.

Here we tell to Bokeh to plot rectangles with the height and edges specified in our ColumnDataSource object (remember the data comes from our Numpy histogram dataframe).

It might seem a little verbose right now (especially compared to Seaborn) but the results will be worth it!Then we add the hover tool:Here we are referencing the ‘Interval” column we created above and as well as the count.

These will be displayed when the user hovers over a bin.

Lastly we ask Bokeh to show us the plot:We need the option of returning the plot so that this function can be called as a helper is the next two functions.

And voila!.We have our function for creating a histogram with hover tool interactivity!.The last bit we will add here is an option for plotting on a log scale, which will require an if, else break.

Note that if log_scale=True then we add another column to the histogram dataframe.

I have deliberately left the docstring blank and encourage you to write it on your own.

This will help you build your understanding of the code and explain any additional functionality you may have written.

Function Two: Histotabs for PLotting a Group of Numeric VariablesThe basic hist_hover function above is the most complex, and will actually be called as the basis for the two types of tabbed interfaces we will create next.

Here is the function to create a tabbed interface of a set of continuous variable.

When you call the function the tabbed interface will appear in a new browser window or Jupyter notebook cell (if specified).

Here we create an empty list to store our individual histograms (one for each variable).

Then we the histograms one by one by calling hist_hover with the appropriate column.

Each histogram is stored in its own Panel object, which is added to our list of histograms.

We then create a Bokeh Tabs object assigning the content to our list of histograms, and ask Bokeh to show the Tabs object.

Function Three: Filtered Histotabs for Looking at a Single Numeric Variable Filtered by a Catergorical VariableHere’s our final function:This function is straightforward as well.

First we filter the dataframe by the unique values in the filter_feature.

Then we call hist_hover with this filtered dataframe and our target feature.

Again we create a unique Panel object for each histogram, add them to our list, and use that list as the content for our Tabs object.

And voila!.We have a nice interactive set of histograms.

Putting it all togetherNow we take these three functions and wrap them up in a Bokeh Histogram class where they will become methods.

Core visual attributes can be defined when we instantiate the class.

Then it’s easy to call the methods and create any histograms you need.

Next is an example of how you can use this class.

First load some data into a Pandas dataframe.

For demonstration I am using the Harvard Ed X data set which you can download here.

df = pd.

read_csv('path_to_your_data_file')Create an instance of the Bokeh Histogram object.

Custom colors and sizes can be declared here, but I’m sticking with the defaults for now:h = BokehHistogram()Then simply call methods off of your object with the appropriate parameters specified.

I chose to plot on a log scale because the data approach an exponential distribution.

I also simply filled null values with zero for demonstration purposes (not always a good idea!)And now you should have some nifty tabbed histograms to share!.Please modify the code as you see fit, customize it to your needs, and write some docstrings and comments!.My fully commented code can be found hereThanks for reading!Originally published at gist.

github.

com.

.

. More details

Leave a Reply