Principal Component Analysis — Math and Intuition (Post 2)Shivangi PatelBlockedUnblockFollowFollowingApr 4This is Post 2 of a 3-part series on Principal Component Analysis — Math and intuition.

If you would like to read an intuitive explanation of PCA with a real world example, please head on to Post 1.

The math of PCA involves an intermediate level of Linear Algebra concepts that will be covered here.

Linear AlgebraLinear Algebra is a branch of mathematics which deals with linear relationships between numbers.

Consider the following equations,2x + 3y = 135x + 2y = 16Although you might be tempted to solve them manually which may not take more than 30 seconds; imagine a scenario where you have 10 variables and 10 equations.

This is where vector and matrix notations come into picture.

The equations above can be represented as,Vectors and Matrices are notational conveniences for compactly representing linear equations.

If a vector is a list, matrix is a list of lists.

An m x n matrix is a rectangular array of numbers with m rows and n columns.

A vector is also a matrix but with either 1 row or 1 column.

There are ways to solve the above representation and determine values of x and y that we will not delve into here.

The concept to understand in the above example is that the vector on left hand side of equation is transformed by the matrix, to give the vector on right hand side of the equation.

Special matricesWe have seen that a matrix is a 2D array of numbers that perform operations on vectors or other matrices.

Various matrices with special properties have been defined in Linear Algebra, which help in making these operations easier.

You will realise the importance of these special matrices when you encounter them again and again in machine learning applications.

Let’s get introduced to a few that are relevant to PCA (and several other machine learning applications).

Square matrixA matrix is square when the number of rows is equal to number of columns.

Simple?Square matricesDiagonal matrixThe main diagonal of a matrix is the one running from top left to bottom right.

A matrix is diagonal if all its non-diagonal elements are 0.

Non-zero entries are only found along the main diagonal.

The main diagonal may or may not have zeroes.

Diagonal matrixIdentity matrixIdentity matrix is a special square matrix.

Identity matrix does nothing to a vector when multiplied to it.

Consider this equivalent to multiplying a scalar number to 1 (ex, 5 x 1 = 5).

Similarly, multiplying a vector to an Identity matrix does nothing.

Sounds lame, but it makes some operations a whole lot easier.

An identity matrix is usually denoted as I.

Identity matrixInverse matrixReciprocal of a scalar number 5 is 1/5 or 5^-1.

Inverse matrix is the same idea applied to matrices.

Multiplying a matrix A with its Inverse matrix A^-1 gives the Identity matrix (which is like 1 for matrices), a concept widely used in Linear Algebra operations.

Inverse Matrix operationTranspose operationApplying a transpose operation to a matrix is turning all its rows into columns and vice-versa.

Transpose of a matrix A is denoted by A^T.

Matrix TransposeOrthogonal and Orthonormal matricesAn Orthogonal matrix is a type of square matrix whose columns and rows are orthonormal unit vectors, that is perpendicular and have a magnitude of 1.

Theorem 1: If A is an orthogonal matrix, its inverse is equal to the transpose i.

e.

A^T = A^-1Symmetric matrixSymmetric matrices are special square matrices.

Matrix A is symmetric if it is equal to its transpose A^T.

The axis of symmetry is always the main diagonal of a symmetric matrix.

Theorem 2: If A is any matrix, then (A^T)A and A(A^T) are symmetric.

Symmetric matrixYou may not have to memorise the different matrix types or the theorems.

Its a good idea to save this as a cheatsheet as you will encounter them often.

Basis changeConsider a vector r in a 2-dimensional space R².

The space R² can be defined by an arbitrary set of orthogonal, unit length vectors e1 and e2.

Vector r in R² spaceNow, looking at vector r with respect to coordinates e1 and e2, it can be represented as follows,Vector r defined with basis eWe initially denoted vectors e1 and e2 as an ‘arbitrary’ set of vectors.

By convention, e1 and e2 form a standard coordinate set as they are of unit length and orthogonal to each other.

When vector r is described with respect to a set of vectors e or basis e, it can be denoted as re (read vector r with basis e).

However, the same vector r can still be described with respect to another set of coordinates, example b1 and b2.

In this new coordinate system, the numbers of vector rb (read vector r with basis b) will be different; represented as [x,y] in the figure below.

Vectors b1 and b2 in the figure are described with respect to standard coordinates or basis e.

Note that vectors b1 and b2 do not have to be orthogonal to each other (in this particular example, they are orthogonal).

Vector r with basis e and basis bHow do we determine the numbers in rb ?Since we know the vectors of new coordinate set b with respect to basis e, we have the transformation matrix to change any vector from basis b to basis e.

Let us call this transformation matrix as B.

Thus, to determine numbers of vector r in basis b, all we have to do now is multiply the inverse of matrix B to vector r in basis e.

I have not gone into the details of calculations as it is trivial.

What we need to focus on here is that a vector isn’t tied to the coordinate axes set used to describe it originally.

We can re-describe it using a new set of coordinate axes or basis vectors.

This is an important concept in Linear Algebra.

It allows us to move basis of a vector or data item, which moves the numbers in the vector.

Let’s take this one step further.

You want to apply a transformation matrix R described with basis e on vector rb (vector r with basis b).

Since we do not know the transformation matrix R in basis b, we would first apply it to vector re and then transform it back to basis b.

This can be done in 3 steps as followsStep 1: Change basis of vector rb to reStep 2: Apply transformation matrix RStep 3: Transform back to basis bTo summarize, if you want to apply a transformation matrix described in one set of coordinates (matrix R in basis e) to a vector described in another set of coordinates (vector r in basis b), which you will encounter quite often in Linear Algebra; it can be done by wrapping R in B (inverse) and B.

This was probably a bit much to take in but its a very important concept in Linear Algebra and worth investing efforts into.

Eigenvectors and EigenvaluesThe word ‘eigen’ which is German derived, means ‘characteristic’.

It helps us identify a characteristic property of something.

Up until now we have been looking at how a vector changes upon applying a transformation matrix.

Now let’s try to visualize how a transformation matrix affects the entire vector space.

Consider the example vector subspace of R², highlighted by a square.

For ease of understanding, I have highlighted three vectors in this space (red, green and blue).

Let us apply a transformation matrix T that scales up this square subspace to a rectangular shape.

Notice how some vectors remain unchanged in their span, while others do not.

The horizontal vector in red points in the same direction and does not change in its magnitude.

The vertical green vector points in the same direction, but scales up in magnitude by a factor of 2.

The diagonal blue vector points to some other direction and has also scaled up in magnitude.

In fact, besides the horizontal and vertical vectors; all the vectors in our subspace have changed by applying this transformation of vertical scaling.

The horizontal and vertical vectors are therefore special and ‘characteristic’ of this particular transformation matrix T and are referred to as ‘eigenvectors’ of transformation matrix T.

Further, the magnitude of the horizontal vector remained unchanged, meaning it has a corresponding ‘eigenvalue’ of 1.

Eigenvalue of the vertical vector is 2 since it doubled up in magnitude upon transformation.

Note that although not applicable in this particular example, vectors that flip in direction upon transformation; but remain in the same span are also known as eigenvectors.

You just apply a (-) negative sign to its associated eigenvalue.

The concept of eigenvectors and eigenvalues that we saw by a geometric representation can be represented algebraically as follows,where vector v is an eigenvector associated with transformation matrix T.

it changes by a scalar number λ which is the eigenvalue associated with vector v.

So, what we have learnt is that eigenvectors are vectors that do not change much upon transformation.

They may get scaled or compressed, point in same direction or get reversed; but they always remain in the same span.

Eigenvectors are characteristics of a particular transformation matrix and the reason why they are important is that they make interesting basis vectors or coordinates, as we will see shortly.

Changing to EigenbasisConsider a scenario where you have to apply a 3 x 3 transformation matrix T to a vector x, then apply it again and again upto n times, where n = 1 million.

There are several real world applications where you may have to perform matrix multiplications at an even larger scale, making these tasks computationally demanding.

However, there is a neat exception to this problem and that is when T is a diagonal matrix.

A diagonal matrix is one having non-zero elements only in the diagonal running from upper left to the lower right.

Multiplying a diagonal matrix to itself n times is as good as raising the diagonal elements by a factor of n!Raising a Diagonal matrix to a power of nWhat if matrix T is not diagonal?.This is where the concepts of changing basis and eigenvectors that we learnt so far, can be applied in a real world scenario.

Let us revisit our problem here.

We want to apply a transformation matrix T to a vector n times, which requires multiplying T by itself n times or raising T to a power of n.

A simple and elegant solution to our problem is to change basis to eigen-basis; using an eigen-conversion matrix.

If the columns of matrix C are linearly independent eigenvectors of matrix T, we can apply matrix C(inverse) to change to eigen-basis.

This takes us to a world where marix T is represented by a diagonal matrix, let’s call it D.

The diagonal in matrix D is comprised of eigenvalues associated with corresponding eigenvectors.

Now, applying this transformation n times is just pure scaling or effectively raising the eigenvalues in the main diagonal to a power of n, making the computational process way more efficient.

All we need to do now is change the basis back to original coordinates, and voila!.We have the matrix T raised to the power of n.

Diagonalization of a matrixThe key take home message here is that if a matrix can be made diagonal by change of basis to eigenbasis, it is known as a diagonalizable matrix.

Spectral theoremEverything that we have learnt so far takes us to the Spectral theorem.

Let A be an m×n matrix of real numbers and A^T its transpose.

Theorem 3 : If A is symmetric (meaning A^T = A), then A is orthogonally diagonalizable and has only real eigenvalues.

In other words, there exist real numbers λ1, λ2 .

.

.

λn (the eigenvalues) and orthogonal, non-zero real vectors v1, v2, ….

vn (the eigenvectors) such that for each i = 1,2,…,n:This is Spectral theorem, one of the most important and powerful theorems in Linear Algebra.

You may have noted though, it is limited by the fact that it applies only to symmetric matrices.

However, if you recall Theorem 2 (stated under section of Symmetric matrix): If A is any matrix, then (A^T)A and A(A^T) are symmetric.

This implies that the spectral theorem can be applied to matrices (A^T)A and A(A^T).

Spectral theorem lies at the heart of PCA; which is perhaps one of its most important applications.

And that is it!.You have built a strong foundation in Linear Algebra and know enough math to understand the math of PCA.

I am afraid it was a gruelling post, but the concepts that you have learnt here will help you not only in unravelling PCA; but several other applications in data science.

Well done, stay tuned for the third and last post in this series on Connecting the dots to PCA.

ReferencesMathematics for Machine Learning — Linear AlgebraLinear AlgebraPrincipal component analysis with linear algebra.. More details