10 Powerful Applications of Linear Algebra in Data Science (with Multiple Resources)

Overview Linear algebra powers various and diverse data science algorithms and applications Here, we present 10 such applications where linear algebra will help you become a better data scientist We have categorized these applications into various fields – Basic Machine Learning, Dimensionality Reduction, Natural Language Processing, and Computer Vision   Introduction If Data Science was Batman, Linear Algebra would be Robin.

This faithful sidekick is often ignored.

But in reality, it powers major areas of Data Science including the hot fields of Natural Language Processing and Computer Vision.

I have personally seen a LOT of data science enthusiasts skip this subject because they find the math too difficult to understand.

When the programming languages for data science offer a plethora of packages for working with data, people don’t bother much with linear algebra.

That’s a mistake.

Linear algebra is behind all the powerful machine learning algorithms we are so familiar with.

It is a vital cog in a data scientists’ skillset.

As we will soon see, you should consider linear algebra as a must-know subject in data science.

And trust me, Linear Algebra really is all-pervasive!.It will open up possibilities of working and manipulating data you would not have imagined before.

In this article, I have explained in detail ten awesome applications of Linear Algebra in Data Science.

I have broadly categorized the applications into four fields for your reference: Machine learning Dimensionality Reduction Natural Language Processing (NLP) Computer Vision I have also provided resources for each application so you can deep dive further into the one(s) which grabs your attention.

Note: Before you read on, I recommend going through this superb article – Linear Algebra for Data Science.

It’s not mandatory for understanding what we will cover here but it’s a valuable article for your budding skillset.

  Table of Contents Why Study Linear Algebra?.Linear Algebra in Machine Learning Loss functions Regularization Covariance Matrix Support Vector Machine Classification Linear Algebra in Dimensionality Reduction Principal Component Analysis (PCA) Singular Value Decomposition (SVD) Linear Algebra in Natural Language Processing Word Embeddings Latent Semantic Analysis Linear Algebra in Computer Vision Image Representation as Tensors Convolution and Image Processing   Why Study Linear Algebra?.I have come across this question way too many times.

Why should you spend time learning Linear Algebra when you can simply import a package in Python and build your model?.It’s a fair question.

So, let me present my point of view regarding this.

I consider Linear Algebra as one of the foundational blocks of Data Science.

You cannot build a skyscraper without a strong foundation, can you?.Think of this scenario: You want to reduce the dimensions of your data using Principal Component Analysis (PCA).

How would you decide how many Principal Components to preserve if you did not know how it would affect your data?.Clearly, you need to know the mechanics of the algorithm to make this decision.

With an understanding of Linear Algebra, you will be able to develop a better intuition for machine learning and deep learning algorithms and not treat them as black boxes.

This would allow you to choose proper hyperparameters and develop a better model.

You would also be able to code algorithms from scratch and make your own variations to them as well.

Isn’t this why we love data science in the first place?.The ability to experiment and play around with our models?.Consider linear algebra as the key to unlock a whole new world.

  Linear Algebra in Machine Learning The big question – where does linear algebra fit in machine learning?.Let’s look at four applications you will all be quite familiar with.


Loss Functions You must be quite familiar with how a model, say a Linear Regression model, fits a given data: You start with some arbitrary prediction function (a linear function for a Linear Regression Model) Use it on the independent features of the data to predict the output Calculate how far-off the predicted output is from the actual output Use these calculated values to optimize your prediction function using some strategy like Gradient Descent But wait – how can you calculate how different your prediction is from the expected output?.Loss Functions, of course.

A loss function is an application of the Vector Norm in Linear Algebra.

The norm of a vector can simply be its magnitude.

There are many types of vector norms.

I will quickly explain two of them: L1 Norm: Also known as the Manhattan Distance or Taxicab Norm.

The L1 Norm is the distance you would travel if you went from the origin to the vector if the only permitted directions are parallel to the axes of the space.

In this 2D space, you could reach the vector (3, 4) by traveling 3 units along the x-axis and then 4 units parallel to the y-axis (as shown).

Or you could travel 4 units along the y-axis first and then 3 units parallel to the x-axis.

In either case, you will travel a total of 7 units.

 L2 Norm:  Also known as the Euclidean Distance.

L2 Norm is the shortest distance of the vector from the origin as shown by the red path in the figure below: This distance is calculated using the Pythagoras Theorem (I can see the old math concepts flickering on in your mind!).

It is the square root of (3^2 + 4^2), which is equal to 5.

But how is the norm used to find the difference between the predicted values and the expected values?.Let’s say the predicted values are stored in a vector P and the expected values are stored in a vector E.

Then P-E is the difference vector.

And the norm of P-E is the total loss for the prediction.


Regularization Regularization is a very important concept in data science.

It’s a technique we use to prevent models from overfitting.

Regularization is actually another application of the Norm.

A model is said to overfit when it fits the training data too well.

Such a model does not perform well with new data because it has learned even the noise in the training data.

It will not be able to generalize on data that it has not seen before.

The below illustration sums up this idea really well: Regularization penalizes overly complex models by adding the norm of the weight vector to the cost function.

Since we want to minimize the cost function, we will need to minimize this norm.

This causes unrequired components of the weight vector to reduce to zero and prevents the prediction function from being overly complex.

You can read the below article to learn about the complete mathematics behind regularization: How to Avoid Over-Fitting using Regularization The L1 and L2 norms we discussed above are used in two types of regularization: L1 regularization used with Lasso Regression L2 regularization used with Ridge Regression Refer to our complete tutorial on Ridge and Lasso Regression in Python to know more about these concepts.


Covariance Matrix Bivariate analysis is an important step in data exploration.

We want to study the relationship between pairs of variables.

Covariance or Correlation are measures used to study relationships between two continuous variables.

Covariance indicates the direction of the linear relationship between the variables.

A positive covariance indicates that an increase or decrease in one variable is accompanied by the same in another.

A negative covariance indicates that an increase or decrease in one is accompanied by the opposite in the other.

On the other hand, correlation is the standardized value of Covariance.

A correlation value tells us both the strength and direction of the linear relationship and has the range from -1 to 1.

Now, you might be thinking that this is a concept of Statistics and not Linear Algebra.

Well, remember I told you Linear Algebra is all-pervasive?. More details

Leave a Reply