Spectral clusteringThe intuition and math behind how it works!Neerja DoshiBlockedUnblockFollowFollowingFeb 4What is clustering?Clustering is a widely used unsupervised learning method.

The grouping is such that points in a cluster are similar to each other, and less similar to points in other clusters.

Thus, it is up to the algorithm to find patterns in the data and group it for us and, depending on the algorithm used, we may end up with different clusters.

There are 2 broad approaches for clustering:1.

Compactness — Points that lie close to each other fall in the same cluster and are compact around the cluster center.

The closeness can be measured by the distance between the observations.

E.

g.

: K-Means Clustering2.

Connectivity — Points that are connected or immediately next to each other are put in the same cluster.

Even if the distance between 2 points is less, if they are not connected, they are not clustered together.

Spectral clustering is a technique that follows this approach.

The difference between the 2 can easily be shown by this illustration:Figure 1How does Spectral Clustering work?In spectral clustering, the data points are treated as nodes of a graph.

Thus, clustering is treated as a graph partitioning problem.

The nodes are then mapped to a low-dimensional space that can be easily segregated to form clusters.

An important point to note is that no assumption is made about the shape/form of the clusters.

What are the steps for Spectral Clustering?Spectral clustering involves 3 steps:1.

Compute a similarity graph2.

Project the data onto a low-dimensional space3.

Create clustersStep 1 — Compute a similarity graph:We first create an undirected graph G = (V, E) with vertex set V = {v1, v2, …, vn} = 1, 2, …, n observations in the data.

This can be represented by an adjacency matrix which has the similarity between each vertex as its elements .

To do this, we can either compute:1) The ε-neighborhood graph: Here we connect all points whose pairwise distances are smaller than ε.

As the distances between all connected points are roughly of the same scale (at most ε), weighting the edges would not incorporate more information about the data to the graph.

Hence, the ε-neighborhood graph is usually considered as an unweighted graph.

2) KNN Graph: Here we use K Nearest Neighbors to connect vertex vi with vertex vj if vj is among the k-nearest neighbors of vi.

But we may have an issue if the nearest neighbors are not symmetric, i.

e.

if there is a vertex vi which has vj as a nearest neighbor, it is not necessary that vi is a nearest neighbor of vj.

Thus, we end up getting a directed graph which is a problem as we do not know what similarity between 2 points means in that case.

There are two ways of making this graph undirected.

The first way is to simply ignore the directions of the edges, i.

e.

we connect vi and vj with an undirected edge if vi is among the k-nearest neighbors of vj or if vj is among the k-nearest neighbors of vi .

The resulting graph is what is usually called the k-nearest neighbor graph.

The second choice is to connect vertices vi and vj if both vi is among the k-nearest neighbors of vj and vj is among the k-nearest neighbors of vi .

The resulting graph is called the mutual k-nearest neighbor graph.

In both cases, after connecting the appropriate vertices we weight the edges by the similarity of the adjacent points.

3) Fully connected graph: To construct this graph, we simply connect all points with each other, and we weight all edges by similarity sij.

This graph should model the local neighborhood relationships, thus similarity functions such as Gaussian similarity function are used.

Here the parameter σ controls the width of the neighborhoods, similar to the parameter ε in case of the ε-neighborhood graph.

Thus, when we create an adjacency matrix for any of these graphs, Aij ~ 1 when the points are close and Aij → 0 if the points are far apart.

Consider the following graph with nodes 1 to 4, weights (or similarity) wij and its adjacency matrix:L: Graph, R: n x n symmetric adjacency matrixStep 2 — Project the data onto a low-dimensional space:As we can see in Figure 1, data points in the same cluster may also be far away–even farther away than points in different clusters.

Our goal then is to transform the space so that when the 2 points are close, they are always in same cluster, and when they are far apart, they are in different clusters.

We need to project our observations into a low-dimensional space.

For this, we compute the Graph Laplacian, which is just another matrix representation of a graph and can be useful in finding interesting properties of a graph.

This can be computed as:Computing Graph LaplacianGraph Laplacian for our example aboveThe whole purpose of computing the Graph Laplacian L was to find eigenvalues and eigenvectors for it, in order to embed the data points into a low-dimensional space.

So now, we can go ahead and find eigenvalues.

We know that:Let us consider an example with numbers:We then compute eigenvalues and eigenvectors for L.

Step 3 — Create clusters:For this step, we use the eigenvector corresponding to the 2nd eigenvalue to assign values to each node.

On calculating, the 2nd eigenvalue is 0.

189 and the corresponding eigenvector v2 = [0.

41, 0.

44, 0.

37, -0.

4, -0.

45, -0.

37].

To get bipartite clustering (2 distinct clusters), we first assign each element of v2 to the nodes such that {node1:0.

41 , node2:0.

44 , … node6: -0.

37}.

We then split the nodes such that all nodes with value > 0 are in one cluster, and all other nodes are in the other cluster.

Thus, in this case, we get nodes 1, 2 & 3 in one cluster, and 4, 5 & 6 in the 2nd cluster.

It is important to note that the 2nd eigenvalue indicates how tightly connected the nodes are in the graph.

For good, clean partitioning, lower the 2nd eigenvalue, better the clusters.

Eigenvector v2 gives us bipartite clustering.

For k clusters, we have to modify our Laplacian to normalize it.

Thus we get:Normalized Laplacian — Ng, Jordan, WeissAdvantages and Disadvantages of Spectral ClusteringAdvantages:Does not make strong assumptions on the statistics of the clusters — Clustering techniques like K-Means Clustering assume that the points assigned to a cluster are spherical about the cluster center.

This is a strong assumption to make, and may not always be relevant.

In such cases, spectral clustering helps create more accurate clusters.

Easy to implement and gives good clustering results.

It can correctly cluster observations that actually belong to the same cluster but are farther off than observations in other clusters due to dimension reduction.

Reasonably fast for sparse data sets of several thousand elements.

Disadvantages:Use of K-Means clustering in the final step implies that the clusters are not always the same.

They may vary depending on the choice of initial centroids.

Computationally expensive for large datasets — This is because eigenvalues and eigenvectors need to be computed and then we have to do clustering on these vectors.

For large, dense datasets, this may increase time complexity quite a bit.

In this blog I have explained the math behind spectral clustering.

Any feedback or suggestions are welcome!.Do check out my other blogs in the mean time!Referenceshttps://calculatedcontent.

com/2012/10/09/spectral-clustering/http://ai.

stanford.

edu/~ang/papers/nips01-spectral.

pdfhttps://www.

youtube.

com/watch?v=zkgm0i77jQ8About MeA data scientist, currently protecting AWS customers from fraud.

Prior work in building predictive and recommendation algorithms for businesses in the financial space.

LinkedIn : https://www.

linkedin.

com/in/neerja-doshi/.