A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way

The advancements in Computer Vision with Deep Learning has been constructed and perfected with time, primarily over one particular algorithm — a Convolutional Neural Network.IntroductionA CNN sequence to classify handwritten digitsA Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance to various aspects/objects in the image and be able to differentiate one from the other.The Convolution LayerConvoluting a 5×5 image with a 3×3 kernel to get a 3×3 convolved featureImage Dimensions = 5 (Height) x 5 (Breadth) x 1 (Number of channels, eg. RGB)In the above demonstration, the green section resembles our 5x5x1 input image, I..The element involved in carrying out the convolution operation in the first part of a Convolutional Layer is called the Kernel/Filter, K, represented in the color yellow..We have selected K as the 3x3x1 matrix [[1, 0, 1]; [0, 1, 0]; [1, 0, 1]]..The Kernel shifts 9 times because of Stride Length = 1, everytime performing a matrix multiplication operation between K and the portion P of the image over which the kernel is hovering.The objective of the Convolution Operation is to extract the high-level features such as edges, from the input image..There are two types of results to the operation — one in which the convolved feature is reduced in dimensionality as compared to the input, and the other in which the dimensionality is either increased or remains the same..This is done by applying Valid Padding in case of the former, or Same Padding in the case of the latter.SAME padding: 5x5x1 image is padded with 0s to create a 6x6x1 imageWhen we augment the 5x5x1 image into a 6x6x1 image and then apply the 3x3x1 kernel over it, we find that the convolved matrix turns out to be of dimensions 5x5x1..Hence the name, Same Padding.Pooling Layer3x3 pooling over 5×5 convolved featureSimilar to the Convolutional Layer, the Pooling layer is responsible for reducing the spatial size of the Convolved Feature..This is to decrease the computational power required to process the data through dimensionality reduction..Furthermore, it is useful for extracting dominant features which are rotational and positional invariant, thus maintaining the process of effectively training of the model.There are two types of Pooling: Max Pooling and Average Pooling..Max Pooling returns the maximum value from the portion of the image covered by the Kernel..On the other hand, Average Pooling returns the average of all the values from the portion of the image covered by the Kernel.Types of PoolingThe Convolutional Layer and the Pooling Layer, together form the i-th layer of a Convolutional Neural Network..Depending on the complexities in the images, the number of such layers may be increased for capturing low-levels details even further, but at the cost of more computational power.After going through the above process, we have successfully enabled the model to understand the features.. More details

Leave a Reply