Understanding and Coding a ResNet in Keras

Understanding and Coding a ResNet in KerasDoing cool things with data!Priya DwivediBlockedUnblockFollowFollowingJan 4ResNet, short for Residual Networks is a classic neural network used as a backbone for many computer vision tasks.

This model was the winner of ImageNet challenge in 2015.

The fundamental breakthrough with ResNet was it allowed us to train extremely deep neural networks with 150+layers successfully.

Prior to ResNet training very deep neural networks was difficult due to the problem of vanishing gradients.

AlexNet, the winner of ImageNet 2012 and the model that apparently kick started the focus on deep learning had only 8 convolutional layers, the VGG network had 19 and Inception or GoogleNet had 22 layers and ResNet 152 had 152 layers.

In this blog we will code a ResNet-50 that is a smaller version of ResNet 152 and frequently used as a starting point for transfer learning.

Revolution of DepthHowever, increasing network depth does not work by simply stacking layers together.

Deep networks are hard to train because of the notorious vanishing gradient problem — as the gradient is back-propagated to earlier layers, repeated multiplication may make the gradient extremely small.

As a result, as the network goes deeper, its performance gets saturated or even starts degrading rapidly.

I learnt about coding ResNets from DeepLearning.

AI course by Andrew Ng.

I highly recommend this course.

On my Github repo, I have shared two notebooks one that codes ResNet from scratch as explained in DeepLearning.

AI and the other that uses the pretrained model in Keras.

I hope you pull the code and try it for yourself.

Skip Connection — The Strength of ResNetResNet first introduced the concept of skip connection.

The diagram below illustrates skip connection.

The figure on the left is stacking convolution layers together one after the other.

On the right we still stack convolution layers as before but we now also add the original input to the output of the convolution block.

This is called skip connectionSkip Connection Image from DeepLearning.

AIIt can be written as two lines of code :X_shortcut = X # Store the initial value of X in a variable## Perform convolution + batch norm operations on XX = Add()([X, X_shortcut]) # SKIP ConnectionThe coding is quite simple but there is one important consideration — since X, X_shortcut above are two matrixes, you can add them only if they have the same shape.

So if the convolution + batch norm operations are done in a way that the output shape is the same,then we can simply add them as shown below.

When x and x_shortcut are the same shapeOtherwise, the x_shortcut goes through a convolution layer chosen such that the output from it is the same dimension as the output from the convolution block as shown below:X_shortcut goes through convolution blockIn the notebook on Github, the two functions identity_block and convolution_block implement above.

These functions use Keras to implement Convolution and Batch Norm layers with ReLU activation.

Skip connection is technically the one line X = Add()([X, X_shortcut]).

One important thing to note here is that the skip connection is applied before the RELU activation as shown in the diagram above.

Research has found that this has the best results.

Why do Skip Connections work?This is an interesting question.

I think there are two reasons why Skip connections work here:They mitigate the problem of vanishing gradient by allowing this alternate shortcut path for gradient to flow throughThey allow the model to learn an identity function which ensures that the higher layer will perform at least as good as the lower layer, and not worseInfact since ResNet skip connections are used in a lot more model architectures like the Fully Convolutional Network (FCN) and U-Net.

They are used to flow information from earlier layers in the model to later layers.

In these architectures they are used to pass information from the downsampling layers to the upsampling layers.

Testing the ResNet model we builtThe identity and convolution blocks coded in the notebook are then combined to create a ResNet-50 model with the architecture shown below:ResNet-50 ModelThe ResNet-50 model consists of 5 stages each with a convolution and Identity block.

Each convolution block has 3 convolution layers and each identity block also has 3 convolution layers.

The ResNet-50 has over 23 million trainable parameters.

I have tested this model on the signs data set which is also included in my Github repo.

This data set has hand images corresponding to 6 classes.

We have 1080 train images and 120 test images.

Signs Data SetOur ResNet-50 gets to 86% test accuracy in 25 epochs of training.

Not bad!Building ResNet in Keras using pretrained libraryI loved coding the ResNet model myself since it allowed me a better understanding of a network that I frequently use in many transfer learning tasks related to image classification, object localization, segmentation etc.

However for more regular use it is faster to use the pretrained ResNet-50 in Keras.

Keras has many of these backbone models with their Imagenet weights available in its library.

Keras Pretrained ModelI have uploaded a notebook on my Github that uses Keras to load the pretrained ResNet-50.

You can load the model with 1 line code:base_model = applications.

resnet50.

ResNet50(weights= None, include_top=False, input_shape= (img_height,img_width,3))Here weights=None since I want to initialize the model with random weights as I did on the ResNet-50 I coded.

Otherwise I can also load the pretrained ImageNet weights.

I set include_top=False to not include the final pooling and fully connected layer in the original model.

I added Global Average Pooling and a dense output layaer to the ResNet-50 model.

x = base_model.

outputx = GlobalAveragePooling2D()(x)x = Dropout(0.

7)(x)predictions = Dense(num_classes, activation= 'softmax')(x)model = Model(inputs = base_model.

input, outputs = predictions)As shown above Keras provides a very convenient interface to load the pretrained models but it is important to code the ResNet yourself as well at least once so you understand the concept and can maybe apply this learning to another new architecture you are creating.

The Keras ResNet got to an accuracy of 75% after training on 100 epochs with Adam optimizer and a learning rate of 0.

0001.

The accuracy is a bit lower than our own coded model and I guess this has to do with weight initializations.

Keras also provides an easy interface for data augmentation so if you get a chance, try augmenting this data set and see if that results in better performance.

ConclusionResNet is a powerful backbone model that is used very frequently in many computer vision tasksResNet uses skip connection to add the output from an earlier layer to a later layer.

This helps it mitigate the vanishing gradient problemYou can use Keras to load their pretrained ResNet 50 or use the code I have shared to code ResNet yourself.

Other writings: http://deeplearninganalytics.

org/blogPS: I have my own deep learning consultancy and love to work on interesting problems.

I have helped many startups deploy innovative AI based solutions.

Check us out at — http://deeplearninganalytics.

org/.

If you have a project that we can collaborate on, then please contact me through my website or at priya.

toronto3@gmail.

comReferencesDeepLearning.

AIKerasReNet Paper.. More details

Leave a Reply