# How neuroscientists analyze data from transparent fish brains : part 2, clustering neural data.

Finding ensembles of neurons with the same activity is an extremely important question for neuroscientists, since groups of synchronous neurons could underlie “building blocks” in the circuitry of the brain region.

The first step is to normalise the activity of each neuron to substract the background activity.

A baseline activity value, called F0, is determined : for example, it can be the average activity during the first 3 min of the recording.

The normalisation is done by removing F0, then dividing by F0: (F — F0)/F0.

The final result is thus expressed in ΔF/F0 (shorten to “DFF”) : this adimensional value reprents an increase or decrease of activity, expressed in percentage, compared to the baseline.

For example,when F=F0, then ΔF/F0 = 0 (no increase compared to baseline)when F=2*F0, then ΔF/F0 =1 (increase of 100% compared to baseline).

(Note : F stands for “fluoresence”, since neural activity is indirectly measured by the intensity of a fluorescent light emitted by the neuron when it becomes active.

)Once the data is normalised, it is ready for clustering.

The goal is to group neurons together, so — for N neurons recorded in T timestamps — there will be N data points in a T-dimension-space.

For visualisation, let’s plot in a 3D space a dataset of 6 neurons during 3 time stamps :An example data set for visualisation : 6 neurons recorded over 3 time frames = 6 data points in a 3D space.

In fact, neurons are not recorded over 3 time steps only… The real number can be around 50000 frames (for example : 45 minutes of recording at 20Hz => 54000 recorded frames), so this plot should have 50000 dimensions…But then, I hear you saying “50000 dimensions, that’s a lot !.Do I really have to keep them all ?.I’m sure all the dimensions are not THAT useful anyway”.

And you would be right : imagine that for a while, nothing happens in the brain ?.(That’s not physiologically likely in a healthy brain, but let’s pretend).

Or that at some point, ALL the neurons are suddenly super active : that could be possible, but that would not be very informative if our goal is to group neurons in several clusters… So, before clustering, we can reduce the very high dimensionality of our data set with PCA (Principal Component Analysis).

Dimensionality reduction has been described in many posts, so i’ll keep it short here.

Let’s look at a simple 2D dataset displayed below in the figure a): the ages and heights of a dozen of people.

We can see in b) that there is a general “trend” that the data follow, represented by the arrow.

If we project the data points on that arrow (as in c), we still have a decent idea regarding the spread of the data.

This “trend” is the first Principal Component (PC).

In the panel d), we can choose to represent our data in only one dimension, instead of two.

We reduced the number of dimension but we still capture important information about the data set.

The goal of PCA is to compute the Principal Components based on the “original” components of the dataset (here, the age and height).

Helped by the schematics above, you may have noticed that the arrow captures the direction where the variance of the dataset is maximized or, in simpler words, that the 2 points that were the most far away in the original space (in a) are still the most far away in the PCs space (in d).

The rationale behind the PCA is thus to find the dimensions where the variance in the dataset is maximal.

(Remark : this is a completely different rationale for the tSNE — t-Distributed Stochastic Neighbor Embedding — which is also a dimensionality reduction technique widely used in Neuroscience for the visualization of high-dimensional datasets.

That will be described in a next post !)Diagonalization of the covariance matrixOne way to do this is to compute the covariance matrix of your data set and to diagonalize it.

Thus, you find a new space where the covariance matrix is diagonal : the eigenvectors are the Principal Components and the eigenvalues quantify the percentage of variance explained by each Principal Component.

With PCA, the Principal Components are thus orthogonal.

Since the Principal Components are obtained by a combination of the original components, they do not necessarily “mean” anything relevant regarding the data.

In my example, the value along the first PC would be the square root of the age squared plus the height squared… This combination allows us to keep only one dimension.

In our neural dataset, the first dimension is the number of neurons (several hundreds).

With PCA, we can reduce the second dimension from ~50000 (times frames) to 10 (Principal Components).

Once the data is normalised and the second dimension is reduced, we can use a simple and well-known algorithm for clustering : K-means.

Again, many TDS posts are already devoted to this algorithm (you can watch an animation here).

Hence, I will just briefly comment the famous scheme from Bishop’s book.

In the following 2D data set, I want to make 2 groups :From C.

Bishop’s book “Pattern recognition and machine learning”, page 426Initialisation (a) : the centers of the clusters are randomy chosen (red and blue crosses).

Expectation (b) : each data point is assigned to the closest cluster center.

Maximisation (c) : cluster centers are re-computed (they are the center of mass of the colored points).

(d-e-f) : Expectation-Maximization steps are repeated until convergence (ie the cluster centers do not move anymore).

(Remark : As discussed here, the number of clusters K has to be chosen by the user.

With simple datasets, this may be done by visual inspection.

But with high-dimensional datasets, choosing the right K can be tricky.

Several methods exist to help you find an appropriate K as described in this post.

)Using PCA and K-means, neuroscientists can group together neurons with similar activity.

Let’s look at an example from this study by the Yaksi lab in Norway.

The researchers investigated the sense of smell in the zebrafish and how odors are processed in the brain.

They imaged the habenula (a brain region thought to integrate information from the sensory areas) and detected the neurons as I described in my first post.

In the first part of the experiment, the researchers did nothing specific, but some neurons were active anyway.

This is called “spontaneous activity”.

The scientists performed a PCA and a K-means (asking for K=6) and grouped the neurons in 6 clusters based on their activity in time.

The first thing we can see is that the spontaneous activity is not random, and that neurons with similar activity are spatially close !.The following figure shows the activity profile of the neurons, grouped in clusters, and the position of each neuron in the habenula : neurons represented by the same color belong to the same cluster.

In the second part of the experiment, the researchers delivered an odor to the fish through the water (times of the delivery indicated by the arrows in the following picture).

Again, they performed a PCA and a K-means with K=6 and noticed that the clusters during odor delivery were very similar to the ones during spontaneous acitivity.

Hence, the neurons that are synchronous spontaneously are also most likely to be synchronous during odor processing !.It is like some neurons were already “pre-wired” together, making them more likely to respond to an odor in a similar fashion.

How is spontaneous activity organized in the brain ?.What does that imply for information processing ?.Those are excellent questions that animates many neuroscientists !.. More details