Linear Algebra explained in the context of deep learning

on UnsplashIn this article, I have used top down manner to explain linear algebra for deep learning.First providing the applications and uses and then drilling down to provide the concepts.Definition of linear algebra in wikipedia:Linear algebra is the branch of mathematics concerning linear equations and linear functions and their representations through matrices and vector spaces.Table of contents:Introduction.Mathematical perspective of Vectors and matrices.Types of matrices.Decomposition of matrices.Norms.Vectorization.Broadcasting.External resources.Introduction:If you start to learn deep learning , the first thing you will be exposed to is the feed forward neural network, which is the most simple and also highly useful network in deep learning..Let’s break it down.Every single column of the network are vectors.Vectors are dynamic arrays that are a collection of data(or features).In the current neural network, the vector ‘x’ holds the input.It is not mandatory to represent inputs as vectors but if you do so, they become increasingly convenient to perform operations in parallel.Deep learning and in specific, neural networks are computationally expensive, so they require this nice trick to make them compute faster.It’s called vectorization..They make computations extremely faster.This is one of the main reasons why GPUs are required for deep learning, as they are specialized in vectorized operations like matrix multiplication.(we’ll see this in the end in depth).The hidden layer H’s output is calculated by performing H = f( W.x + b ).Here W is called as the Weight matrix, b is called bias and f is the activation function.(this article does not explain about feed forward neural networks, if you need a primer about the concept of FFNN, look here.)Let’s breakdown the equation,the first component is W.x ; this is a matrix-vector product, because W is a matrix and x is a vector.Before getting into multiplying these, let’s get some idea about the notations: usually vectors are denoted by small bold italic letters(like x) and matrices are denoted by capital bold italic letters(like X).If the letter is capital and bold but not italic then it is a tensor(like X).In a computer science perspective:Scalar: A single number.Vector : A list of values.(rank 1 tensor)Matrix: A two dimensional list of values.(rank 2 tensor)Tensor: A multi dimensional matrix with rank n.Drilling down:In a mathematical perspective:Vector:A vector is a quantity that has both magnitude and direction.It is an entity that exists in space, it’s existence is denoted by x∈ ℝ²if it is a 2 dimensional vector that exists in real space.(Each element denotes a coordinate along a different axis.)red and blue color vectors are the basis vectors.All vectors in 2D space can be obtained by linear combination of the two vectors called basis vectors.( denoted by i and j )(In general, a vector in N dimensions can be represented by N basis vectors.)They are unit normal vectors because their magnitude is one and they are perpendicular to each other..All the set of points in the 2D space that can be obtained by linear combination of these two vectors are said to be the span of these vectors.If a vector is represented by a linear combination(addition, multiplication) of set of other vectors, then it is linearly dependent on that set of vectors.(there is no use in adding this new vector to the existing set.)Any two vectors can be added together.They can be multiplied together.Their multiplication is of two types, dot product and cross product.Refer here.Matrix:A matrix is a 2D array of numbers..Each column of a 2 * 2 matrix denotes each of the 2 basis vectors after the 2D space is applied with that transformation.Their space representation is W ∈ ℝ³*² having 3 rows and 2 columns.A matrix vector product is called transformation of that vector, while a matrix matrix product is called as composition of transformations.There is only one matrix which does not does any transformation to the vector.It is the identity matrix(I)..But it’s also good to know their implementations.One such library is numpy for python programming language.There are lots of resources for learning numpy.(which is very important for learning deep learning, if you use python.) look here.Here, np.array creates a numpy array.np.random is a package that contains methods for random number generation.the dot method is to compute product between matrix.We can change the shape of the numpy array and also check it.here you can see that, the product of W.x is a vector and it is added with b, which is a scalar..vhere v is the eigen vector and lambda is the eigen value.## numpy program to find eigen vectors.from numpy import arrayfrom numpy.linalg import eig# define matrixA = array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])print(A)# calculate eigendecompositionvalues, vectors = eig(A)print(values)print(vectors)Eigen decomposition is very useful in machine learning..There are several methods of regularization.Most notably the L1 regularization( Lasso ) and L2 regularization( Ridge ).The details of these are not provided, but to understand these you must know what is a norm.Norm:Norm is the size of the vector.The general formula for norm of a vector x is given by,the L² norm with p=2 is called the euclidean norm because it is the euclidean distance between origin and x.The L¹ norm is simply the sum of all the elements of the vector.It is used in machine learning when the system requires much more precision.To differentiate clearly between a zero and a non zero element..(The above content is extracted from fast.ai machine learning course.)for more details about numpy version of broadcasting, look here.Ok, that’s enough, this article introduced a lot of new words and terminologies to the beginner.But I have also skipped several in depth concepts of vector algebra.This can be overwhelming but still I made the concepts as practical as possible,(feedbacks welcome!).As I have just started in deep learning stuff, I decided to help others who have started, by providing them intuitive articles about deep learning terminologies and stuffs.. More details

Leave a Reply