Visual Interpretability for Convolutional Neural Networks

The Jupyter notebook for each visualization is written in Keras and is available in my GitHub repository:himanshurawlani/convnet-interpretability-kerasVisualizing VGG16 Convolutional Neural Network using Keras – himanshurawlani/convnet-interpretability-kerasgithub.comLet’s see how does a VGG16 model look like:VGG16 convolutional neural networkThe output of each convolutional block is passed through an activation function (ReLU in this case).Visualizing intermediate activationsIn this technique, given an input image, we will simply plot what each filter has extracted (output features) after a convolution operation in each layer..Eg..In VGG16, the input layer dimension is 224x224x3 and the output dimension after the first convolution operation is 224x224x64 (see block1_conv1)..Here, 64 is the number of filters which are used to extract input features after 1st convolution operation, so we will just plot these sixty-four 224×224 outputs.Visualizing the output of convolution operation after each layer of VGG16 networkInterpretations:The initial layers (block1 and block2) retain most of the input image features..It looks like the convolution filters are activated at every part of the input image..This gives us an intuition that these initial filters might be primitive edge detectors (since we can consider a complex figure to be made up of small edges, with different orientations, put together.)As we go deeper (block3 and block4) the features extracted by the filters become visually less interpretable..An intuition for this can be that the convnet is now abstracting away visual information of the input image and trying to convert it to the required output classification domain.In block5 (especially block5_conv3) we see a lot of blank convolution outputs..This means that the pattern encoded by the filters were not found in the input image..Most probably, these patterns must be complex shapes that are not present in this input image.To elaborate on points 2 and 3, we can compare these insights with how our own visual perception works: When we look at an object (say bicycle) we don’t sit and observe each and every detail of the object (like handle grip, mudguard, wheel spikes, etc.)..All we see is an object with two wheels being joined by a metallic rod..Hence, if we were told to draw a bicycle it would be a simple sketch which just conveys the idea of two wheels and a metallic rod..This information is enough for us to decide that the given object is a bicycle.Something similar is happening in deep neural networks as well.. More details

Leave a Reply