A Layman’s guide to moving from Keras to Pytorch

With the whole session.

run commands and tensorflow sessions, I was sort of confused.

It was not Pythonic at all.

Pytorch helps in that since it seems like the python way to do things.

You have things under your control and you are not losing anything on the performance front.

In the words of Andrej Karpathy:I’ve been using PyTorch a few months now and I’ve never felt better.

I have more energy.

My skin is clearer.

My eye sight has improved.

— Andrej Karpathy (@karpathy) May 26, 2017So without further ado let me translate Keras to Pytorch for you.

The Classy way to write your network?Ok, let us create an example network in keras first which we will try to port into Pytorch.

Here I would like to give a piece of advice too.

When you try to move from Keras to Pytorch take any network you have and try porting it to Pytorch.

It will make you understand Pytorch in a much better way.

Here I am trying to write one of the networks that gave pretty good results in the Quora Insincere questions classification challenge for me.

This model has all the bells and whistles which at least any Text Classification deep learning network could contain with its GRU, LSTM and embedding layers and also a meta input layer.

And thus would serve as a good example.

Also if you want to read up more on how the BiLSTM/GRU and Attention model work do visit my post here.

So a model in pytorch is defined as a class(therefore a little more classy) which inherits from nn.

module .

Every class necessarily contains an __init__ procedure block and a block for the forward pass.

In the __init__ part, the user defines all the layers the network is going to have but doesn't yet define how those layers would be connected to each otherIn the forward pass block, the user defines how data flows from one layer to another inside the network.

Why is this Classy?Obviously classy because of Classes.

Duh! But jokes apart, I found it beneficial due to a couple of reasons:1) It gives you a lot of control over how your network is built.

2) You understand a lot about the network when you are building it since you have to specify input and output dimensions.

So fewer chances of error.

(Although this one is really up to the skill level)3) Easy to debug networks.

Any time you find any problem with the network just use something like print("avg_pool", avg_pool.

size()) in the forward pass to check the sizes of the layer and you will debug the network easily4) You can return multiple outputs from the forward layer.

This is pretty helpful in the Encoder-Decoder architecture where you can return both the encoder and decoder output.

Or in the case of autoencoder where you can return the output of the model and the hidden layer embedding for the data.

5) Pytorch tensors work in a very similar manner to numpy arrays.

For example, I could have used Pytorch Maxpool function to write the maxpool layer but max_pool, _ = torch.

max(h_gru, 1) will also work.

6) You can set up different layers with different initialization schemes.

Something you won’t be able to do in Keras.

For example, in the below network I have changed the initialization scheme of my LSTM layer.

The LSTM layer has different initializations for biases, input layer weights, and hidden layer weights.

7) Wait until you see the training loop in Pytorch You will be amazed at the sort of control it provides.

Now the same model in Pytorch will look like something like this.

Do go through the code comments to understand more on how to port.

Hope you are still there with me.

One thing I would like to emphasize here is that you need to code something up in Pytorch to really understand how it works.

And know that once you do that you would be glad that you put in the effort.

On to the next section.

Tailored or Readymade: The Best Fit with a highly customizable Training LoopCustomizable==BeautifulIn the above section, I wrote that you will be amazed once you saw the training loop.

That was an exaggeration.

On the first try, you will be a little baffled/confused.

But as soon as you read through the loop more than once it will make a lot of intuitive sense.

Once again read up the comments and the code to gain a better understanding.

This training loop does k-fold cross-validation on your training data and outputs Out-of-fold train_preds and test_preds averaged over the runs on the test data.

I apologize if the flow looks something straight out of a kaggle competition, but if you understand this you would be able to create a training loop for your own workflow.

And that is the beauty of Pytorch.

So a brief summary of this loop are as follows:Create stratified splits using train dataLoop through the splits.

Convert your train and CV data to tensor and load your data to the GPU using the X_train_fold = torch.


astype(int)], dtype=torch.


cuda() commandLoad the model onto the GPU using the model.

cuda() commandDefine Loss function, Scheduler, and OptimizerCreate train_loader and valid_loader` to iterate through batches.

Start running epochs.

In each epochSet the model mode to train using model.


Go through the batches in train_loader and run the forward passRun a scheduler step to change the learning rateCompute lossSet the existing gradients in the optimizer to zeroBackpropagate the losses through the networkClip the gradientsTake an optimizer step to change the weights in the whole networkSet the model mode to eval using model.


Get predictions for the validation data using valid_loader and store in variable valid_preds_foldCalculate Loss and printAfter all the epochs are done, predict the test data and store the predictions.

These predictions will be averaged at the end of the split loop to get the final test_predsGet Out-of-fold(OOF) predictions for train set using train_preds[valid_idx] = valid_preds_foldThese OOF predictions can then be used to calculate the Local CV score for your model.

But Why?.Why so much code?Okay.

I get it.

That was probably a handful.

What you could have done with a simple.

fit in keras, takes a lot of code to accomplish in Pytorch.

But understand that you get a lot of power too.

Some use cases for you to understand:While in Keras you have prespecified schedulers like ReduceLROnPlateau (and it is a task to write them), in Pytorch you can experiment like crazy.

If you know how to write Python you are going to get along just fineWant to change the structure of your model between the epochs.

Yeah, you can do it.

Changing the input size for convolution networks on the fly.

And much more.

It is only your imagination that will stop you.

Wanna Run it Yourself?You have all the tools!.Do something…So another small confession here.

The code above will not run as is as there are some code artifacts which I have not shown here.

I did this in favor of making the post more readable.

Like you see the seed_everything, MyDataset and CyclicLR (From Jeremy Howard Course) functions and classes in the code above which are not really included with Pytorch.

But fret not my friend.

I have tried to write a Kaggle Kernel with the whole running code.

You can see the code here and include it in your projects.

If you liked this post, please don’t forget to upvote the Kernel too.

I will be obliged.

Endnotes and ReferencesThis post is a result of an effort of a lot of excellent Kagglers and I will try to reference them in this section.

If I leave out someone, do understand that it was not my intention to do so.

Discussion on 3rd Place winner model in Toxic comment3rd Place model in Keras by Larry FreemanPytorch starter Capsule modelHow to: Preprocessing when using embeddingsImprove your Score with some Text PreprocessingPytorch baselinePytorch starterOriginally published at mlwhiz.

com on January 6, 2019.

.. More details

Leave a Reply