Hitchhiker’s Guide to Residual Networks (ResNet) in Keras

Hitchhiker’s Guide to Residual Networks (ResNet) in KerasLearn the foundations of residual networks and build a ResNet in KerasMarco PeixeiroBlockedUnblockFollowFollowingApr 8Photo by Andrés Canchón on UnsplashVery deep neural networks are hard to train as they are more prone to vanishing or exploding gradients.

To solve this problem, the activation unit from a layer could be fed directly to a deeper layer of the network, which is termed as a skip connection.

This forms the basis of residual networks or ResNets.

This post will introduce the basics the residual networks before implementing one in Keras.

With ResNets, we can build very deep neural networksResidual blockA building block of a ResNet is called a residual block or identity block.

A residual block is simply when the activation of a layer is fast-forwarded to a deeper layer in the neural network.

Example of a residual blockAs you can see in the image above, the activation from a previous layer is being added to the activation of a deeper layer in the network.

This simple tweak allows training much deeper neural networks.

In theory, the training error should monotonically decrease as more layers are added to a neural network.

In practice however, for a traditional neural network, it will reach a point where the training error will start increasing.

ResNets do not suffer from this problem.

The training error will keep decreasing as more layers are added to the network.

In fact, ResNets have made it possible to train networks with more than 100 layers, even reaching 1000 layers.

Building a ResNet for image classificationNow, let’s build a ResNet with 50 layers for image classification using Keras.

Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano.

It was developed with a focus on enabling fast experimentation.

In this case, we will use TensorFlow as the backend.

Of course, feel free to grab the entire notebook and make all the necessary imports before starting.

Step 1: Define the identity blockFirst, we define the identity block, which will make our neural network a residual network as it represents the skip connection:Step 2: Convolution blockThen, we build a convolution block like so:Notice how the convolution block combines both the main path and the shortcut.

Step 3: Build the modelNow, we combine both blocks into building a 50-layer residual network:Step 4: TrainingBefore training, realize that we have a function that returns model.

Therefore, we need to assign it to a variable.

Then, Keras requires us to compile the model:model = ResNet50(input_shape = (64, 64, 3), classes = 6)model.

compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])Once that is done, we can normalize our images and one-hot encode them:Afterwards, we can fit the model:model.

fit(X_train, Y_train, epochs = 2, batch_size = 32)And see how it performed:preds = model.

evaluate(X_test, Y_test)print (“Loss = “ + str(preds[0]))print (“Test Accuracy = “ + str(preds[1]))Now, you will see that we only get 16% accuracy.

This is because we only trained on 2 epochs.

You can train your model for longer on your own machine, but do realize that it will take a very long time, since it the network is very large.

Step 5: Print the model summaryKeras makes it very easy to have a summary of the model we just built.

Simply run this code:model.

summary()and you get a detailed summary of each layer in your network.

You can also generate a picture of the network’s architecture and save it in your working directory:plot_model(model, to_file=’ResNet.

png’)SVG(model_to_dot(model).

create(prog=’dot’, format=’svg’))Great!.You just learned the basics of a residual network and built one using Keras!.Again, feel free to train the algorithm longer (~20 epochs), and you should see that the network performs very well.

However, if you train only on CPU, this might take more than 1h.

In a future post, I will show how to perform neural style transfer in TensorFlow, which is a very fun way to apply convolution neural networks!Keep learning!Reference: deeplearning.

ai.

. More details

Leave a Reply