Everything you need to know about Scatter Plots for Data Visualisation

We’re going to go through all the parameters and see when and how to use them with code.

You might just find a few nice surprises and tricks that you can add to your Data Science toolbox!Regression plottingWhen we first plot our data on a scatter plot it already gives us a nice quick overview of our data.

In the far left figure below, we can already see the groups where most of the data seems to bunch up and can quickly pick out the outliers.

But it’s also nice to be able to see how complicated our task might get; we can do that with regression plotting.

In the middle figure below we’ve done a linear plot.

It’s pretty easy to see that a linear function won’t work as many of the points are pretty far away from the line.

The far-right feature uses a polynomial of order 4 and looks much more promising.

So it looks like we’ll definitely need something of at least order 4 to model this dataset.

Color and ShapeColor and shape can be used to visualise the different categories in your dataset.

Color and shape are both very intuitive to the human visual system.

When you look at a plot where groups of points have different colors our shapes, it’s pretty obvious right away that the points belong to different groups.

It just naturally makes sense to us.

This natural intuition is always what you want to be playing off of when creating clear and compelling data visualisations.

Make it so obvious that it’s self-explanatory.

The figure on the left below shows the classes being grouped by color; the figure on the right shows the classes separated by both color and shape.

In both cases it’s much easier to see the groupings than when we just had all blue!.We now know that it’ll probably be easy to separate the setosa class with low error and that we should focus our attention and figuring out how to separate the other two from each other.

It’s also clear that a single linear plot won’t be able to separate the green and orange points; we’ll need something a bit more high-dimensional.

Choosing between color and shape becomes a matter of preference.

Personally, I find color a bit more clear and intuitive, but take your pick!Marginal HistogramScatter plots with marginal histograms are those which have plotted histograms on the top and side, representing the distribution of the points for the features along the x- and y- axes.

It’s a small addition but great for seeing the exact distribution of our points and more accurately identify our outliers.

For example, in the figure below we can see that the why axis has a very heavy concentration of points around 3.

0.

Just how concentrated?.That’s most easily seen in the histogram on the far right, which shows that there is at least triple as many points around 3.

0 as there are for any other discrete range.

We also see that there’s barely any points above 3.

75 in comparison to other ranges.

For the x-axis on the otherhand, things are a bit more evened out, except for the outliers on the far right.

Bubble PlotsWith bubble plots we are able to use several variables to encode information.

The new one we will add here is size.

In the figure below we are plotting the number of french fries eaten by each person vs their height and weight.

Notice that a scatter plot is only a 2D visualisation tool, but that using different attributes we can represent 3-dimensional information.

Here we are using color, position, and size.

The position determines the person’s height and weight, the color determines the gender, and the size determines the number of french fries eaten!.The bubble plot lets us conveniently combine all of the attributes into one plot so that we can see the high-dimensional information in a simple 2D view; nothing crazy complicated.

Like to learn?Follow me on twitter where I post all about the latest and greatest AI, Technology, and Science!.Connect with me on LinkedIn too!.. More details