Build Better and Accurate Clusters with Gaussian Mixture Models

Expectation-Maximization in Gaussian Mixture Models Implementing Gaussian Mixture Models for Clustering in Python   Introduction to Clustering Before we kick things off and get into the nitty-gritty of Gaussian Mixture Models, let’s quickly refresh some basic concepts.

Note: If you are already familiar with the idea behind clustering and how the k-means clustering algorithm works, you can directly skip to the fourth section, ‘Introduction to Gaussian Mixture Models’.

So, let’s start by formally defining the core idea: Clustering refers to grouping similar data points together, based on their attributes or features.

For example, if we have the income and expenditure for a set of people, we can divide them into the following groups: Earn high, spend high Earn high, spend low Earn low, spend low Earn low, spend high Each of these groups would hold a population with similar features and can be useful in pitching the relevant scheme/product to the group.

Think of credit cards, car/property loans, and so on.

In simple words: The idea behind clustering is grouping data points together, such that each individual cluster holds the most similar points.

There are various clustering algorithms out there.

One of the most popular clustering algorithms is k-means.

Let us understand how the k-means algorithm works and what are the possible scenarios where this algorithm might come up short of expectations.

  Introduction to k-means Clustering k-means clustering is a distance-based algorithm.

This means that it tries to group the closest points to form a cluster.

Let’s take a closer look at how this algorithm works.

This will lay the foundational blocks to help you understand where Gaussian Mixture Models will come into play later in this article.

So, we first define the number of groups that we want to divide the population into – that’s the value of k.

Based on the number of clusters or groups we want, we then randomly initialize k centroids.

The data points are then assigned to the closest centroid and a cluster is formed.

The centroids are then updated and the data points are reassigned.

This process goes on iteratively until the location of centroids no longer changes.

Check out the below gif which represents the whole process of initializing and updating clusters.

The number of clusters is assigned to be 10:   Note: This was a brief overview of k-means clustering and is good enough for this article.

If you want to go deeper into the working of the k-means algorithm, here is an in-depth guide: The Most Comprehensive Guide to k-means you’ll Ever Need!.  Drawbacks of k-means Clustering The k-means clustering concept sounds pretty great, right?.It’s simple to understand, relatively easy to implement, and can be applied in quite a number of use cases.

But there are certain drawbacks and limitations that we need to be aware of.

Let’s take the same income-expenditure example we saw above.

The k-means algorithm seems to be working pretty well, right?.Hold on – if you look closely, you will notice that all the clusters created have a circular shape.

This is because the centroids of the clusters are updated iteratively using the mean value.

Now, consider the following example where the distribution of points is not in a circular form.

What do you think will happen if we use k-means clustering on this data?.It would still attempt to group the data points in a circular fashion.

That’s not great!.k-means fails to identify the right clusters: Hence, we need a different way to assign clusters to the data points.

So instead of using a distance-based model, we will now use a distribution-based model.

And that is where Gaussian Mixture Models come into this article!.  Introduction to Gaussian Mixture Models (GMMs) Gaussian Mixture Models (GMMs) assume that there are a certain number of Gaussian distributions, and each of these distributions represent a cluster.

Hence, a Gaussian Mixture Model tends to group the data points belonging to a single distribution together.

Let’s say we have three Gaussian distributions (more on that in the next section) – GD1, GD2, and GD3.

These have a certain mean (μ1, μ2, μ3) and variance (σ1, σ2, σ3) value respectively.

For a given set of data points, our GMM would identify the probability of each data point belonging to each of these distributions.

Wait, probability?.You read that right!.Gaussian Mixture Models are probabilistic models and use the soft clustering approach for distributing the points in different clusters.

I’ll take another example that will make it easier to understand.

Here, we have three clusters that are denoted by three colors – Blue, Green, and Cyan.

Let’s take the data point highlighted in red.

The probability of this point being a part of the blue cluster is 1, while the probability of it being a part of the green or cyan clusters is 0.

Now, consider another point – somewhere in between the blue and cyan (highlighted in the below figure).

The probability that this point is a part of cluster green is 0, right?.And the probability that this belongs to blue and cyan is 0.

2 and 0.

8 respectively.

Gaussian Mixture Models use the soft clustering technique for assigning data points to Gaussian distributions.

I’m sure you’re wondering what these distributions are so let me explain that in the next section.

  The Gaussian Distribution I’m sure you’re familiar with Gaussian Distributions (or the Normal Distribution).

It has a bell-shaped curve, with the data points symmetrically distributed around the mean value.

The below image has a few Gaussian distributions with a difference in mean (μ) and variance (σ2).

Remember that the higher the σ value more would be the spread: Source: Wikipedia In a one dimensional space, the probability density function of a Gaussian distribution is given by: where μ is the mean and σ2 is the variance.

But this would only be true for a single variable.

In the case of two variables, instead of a 2D bell-shaped curve, we will have a 3D bell curve as shown below: The probability density function would be given by: where x is the input vector, μ is the 2D mean vector, and Σ is the 2×2 covariance matrix.

The covariance would now define the shape of this curve.

We can generalize the same for d-dimensions.

Thus, this multivariate Gaussian model would have x and μ as vectors of length d, and Σ would be a d x d covariance matrix.

Hence, for a dataset with d features, we would have a mixture of k Gaussian distributions (where k is equivalent to the number of clusters), each having a certain mean vector and variance matrix.

But wait – how is the mean and variance value for each Gaussian assigned?.These values are determined using a technique called Expectation-Maximization (EM).

We need to understand this technique before we dive deeper into the working of Gaussian Mixture Models.

  What is Expectation-Maximization?.Excellent question!. More details

Leave a Reply