Superset as a bridge between devs and domain experts

Let me introduce you to… Superset!via GIPHYSuperset is an open-source product originally developed by AirBnB that, as stated in their docs:Apache Superset (incubating) is a modern, enterprise-ready business intelligence web application.

What’s more important, it manages to give tools for both technical and non-tech people to work with it.

You can easily build a plot using data from a table in your database, a csv file or a Druid cluster.

But in case you need to customize a plot, or to obtain data from a overly-complicated query, you still have the ways to do so, nice!So… let’s install it?Disclaimer: I’m going to install locally (Ubuntu 18.

04), as a first approach to working with Superset.

I do not recommend following the same steps in a production environment!First you need to install both Docker and docker-composeClone the Superset repogit clone https://github.

com/apache/incubator-superset/3.

Navigate to the folder in which the docker files are placedcd incubator-superset/contrib/docker4.

Run docker-compose specifying file docker-init.

sh as entrypointdocker-compose run –rm superset .

/docker-init.

shThis is going to download (the first time it may take a long time) and configure three containers: one with Superset, another one with a Redis instance (to handle cache) and a last one with Postgres (handles metadata).

You’ll be asked for an user and password in this process.

5.

Launch it!docker-compose up -dNow you should be able to access to your Superset service using http://localhost:8088/login/, congrats!As specified in the docker-compose.

yaml file, the containers will be automatically run when restarting docker (or the computer), so you don’t have to worry about rerunning docker-compose when restarting.

Quick Note: If the last step fails for you have to modify file incubator-superset/contrib/docker/docker-compose.

yaml and set the environment to production:SUPERSET_ENV: production#SUPERSET_ENV: developmentThis is an already documented problem (Link to Github issue) and hopefully there is a fix coming soon enough.

Ok, that was way too boring, let’s import our first datasource and build some cool graphics!via GIPHYThe data I’ve used includes information on all property sales in England and Wales that were sold for full market value in 2018 (source).

I’ve downloaded the csv and dumped it into a table in Postgres.

First of all, we need to create a connection from Superset to our database, which is done on section Sources > Databases.

Once in there, the connector is specified via SQLAlchemy url, which supports a wide range of databases.

Here is how I’ve done it with Postgres:The Superset connector uses SQLAlchemyThen, it is needed to add each of the tables that we want to use in our visualizations.

That action is performed in section Sources > Tables.

Now we have to play with Superset two main objects: dashboards and charts.

Each dashboard is composed of several charts that we can arrange as we please.

The process of building a basic chart and composing a dashboard is really simple, so I’m not going to explain how to do it, go and learn!In just about 15 minutes, I’ve managed to build this dashboard:A simple dashboard displaying some basic graphsThe magic of it is that, as we play with the different filters, the visualizations will be updated without refreshing the page.

Don’t believe me?.Just watch the following gif, in which first I apply a filter by city and later a temporal one.

Visualizations are re-rendered as soon as a filter is appliedTake into account that each visualizations needs to execute, at least, one query.

If your dataset is huge and you change filters very fast, you may block your database (sometimes patience is the word).

Ok, this is very fancy and good-looking, but it has a huge flaw.

The chart creator dialog only lets us choose one table, so we can’t join data from different tables… or can we?.Yes, we can!.(no Obama pun intended) Thanks to SQL Lab, which is a section in which we can query our database, save the results and use them as a visualizations’ data source.

For example, if I launch the following query:SQL Lab query composer dialogTo build a plot based on the results of this query, it is as simple as running the query and then clicking on the Explore button that appears on the left bottom corner.

Then, we will be redirect to the visualization creation view, nice!.This is the part where Dev has to work, because Biz is not supposed to know anything about SQL syntax (I mean, if they do, you should be happy).

So, at the end of the day:Dev’s workload is decreased.

Biz is able to create and modify its own visualizations, as they please.

There is no need to read a lot of documentation to start producing results.

Dashboard documents are stored in JSON files, so they can be added to your version control system.

Visualizations look nice ????.(even though sometimes are annoying)Of course, Superset is not perfect and have some flaws:Deployment in production can be a pain in the ass (we used k8s and kinda wanted to kill the developers at some point).

The user roles management tab is nearly impossible to understand.

You have to stick with the already implemented plots (they have a good PR policy so we can expect more to come soon).

You have to update frequently, as they launch releases in a high pace (hmm… maybe this is more of an advantage :D)That’s all!.Thank you for reading, hopefully you’ve been able to create your first dashboard with no difficulties.

I’ll leave below a short list of resources in case you need more information/help.

To learn more about Superset:AirBnB post introducing Superset.

A cool post explaining some basics by InDataLabs.

Superset docs.

Superset github project: if you found a issue, maybe it’s been already reported.

Stack Overflow.

.

. More details

Leave a Reply