Pytorch Cheat Sheet for Beginners and Udacity Deep Learning Nanodegree

Pytorch Cheat Sheet for Beginners and Udacity Deep Learning NanodegreeUniqtechBlockedUnblockFollowFollowingMay 31Getting started with Pytorch using a cohesive, top down approach cheatsheet.

This cheatsheet should be easier to digest than the official documentation and should be transitional tool to get students and beginners to get started reading documentations soon.

This article is being improved continuously.

It is frequently updated and will remain under construction until it is significantly improved.

Feedbacks are appreciated hi@uniqtech.

co and mistakes, typos will be promptly corrected.

Big news: we got published on Medium Machine Learning and Data Science homepage.

Please clap and comment to show your support.

This cheatsheet below is primarily narrative.

APDF JPEG version of a detailed cheatsheet will be released this week, in this article!pytorch cheatsheet for beginners by uniqtechPytorch Defined in Its Own WordsPytorch is “An open source deep learning platform that provides a seamless path from research prototyping to production deployment.

”Key FeaturesKey features: Hybrid Front-End, Distributed Training, Python-First, Tools & Libraries.

These features are elegantly illustrated with side-by-side code example on the features page!The features page on pytorch documentation shows elegant code sample to illustrated each feature.

Also note Python 3 short hand for dot product such as “@”Hybrid Front-End allows switching between eager mode and graph mode.

Tensorflow used to be graph mode only, which was considered fast and efficient but very hard to modify, prototype and research.

This gap is closing since Tensorflow now also offers eager mode.

Distributed Training: supports GPU, CPU and easy switching between the two.

(Tensorflow supports TPU in addition.

Its own Tensor Processing Unit.

)Python-First: built for python developers.

Easily create neural network, run deep learning in Pytorch.

Tools & Libraries include robust computer vision libraries (convolutional neural networks and pretrained models), NLP and more.

Pytorch also includes great features like torch.

tensor instantiation and computation, model, validation, scoring, Pytorch feature to auto calculate gradient using autograd which also does all the backpropagation for you, transfer learning ready preloaded models and datasets (read our super short effective article on transfer learning), and let’s not forget GPU using CUDA.

When Should You Use PytorchPytorch added production and cloud partner support for 1.

0 for AWS, Google Cloud Platform, Microsoft Azure.

You can now use Pytorch for any deep learning tasks including computer vision and NLP, even in production.

Because it is so easy to use and pythonic to Senior Data Scientist Stefan Otte said “if you want to have fun, use pytorch”.

Pytorch is also backed by Facebook AI research so if you want to work for Facebook data and ML, you should know Pytorch.

If you are great with Python and want to be an open source contribute Pytorch is also the way to go.

Transfer Learning Transfer learning use models to predict the type of the dataset that it wasn’t trained on.

It can significantly improve training time and accuracy.

It can also help with the situation where available training data is limited.

Pytorch has a page dedicated to pre-trained models and its performance across industry standard benchmark datasets.

Read more in our transfer learning with Pytorch article.

# pretrained models are at torchvision > modelsmodel = torchvision.


resnet152(pretrained=False)Read about all the available models on Pytorch documentation.

Note the top-1-error, top-5-error i.


the performance of the models are also available.

Data Science, Academic Research | Presentation Using Juyter NotebookSince Python has a huge developer community, it is easier to find Python talent to transit into Data Science and academic research using Pytorch.

Eliminate the need to learn another language.

Many data analysts and scientists are already familiar with Jupyter Notebook, on which Pytorch operates perfectly.

Read more in our deploying Pytorch model to Amazon Web Service SageMaker.

Pytorch is a Deep Learning FrameworkPytorch is a deep learning framework just like Tensorflow, which means for traditional machine learning models please use another tool for now.

Scikit-learn is also pythonic, an easy-to-use API.

Check it out.

Did you know many Kaggle users including masters still use sklearn train_test_split() to split and scaler to pre-process data, sklearn Gradient Boosting Tree or Support Vector Machine to benchmark performance, and the top notch high-performance XGBoost is notably missing.


js and Tensorflow.

lite gives Tensorflow wings in the browser and on mobile devices.

Apple just announced CreateML for Swift June 2019.

Mobile support is not native yet in Pytorch.

Do not dispair.

Scroll down to read about ONNX an exchange format that is supported by almost of all of the popular frameworks.

Pytorch also has a tutorial on moving a model to mobile, though this road is still bit detoured compared to Tensorflow.

Pytorch Model in a NutshellUsing Sequential is one easy way to quickly define a model.

A named ordered dictionary holds all the layers that are encapsulated in nn.

Sequential , which is then stored in the model.

classifier variable.

This is a quick way to define the bare bone of a model but not necessarily the most Pythonic.

It helps us illustrate a Python model is consisted of fully connected Linear Layers with shape specified in (row, col) tuples.

ReLU activation layers, Dropout with 20% probability and an output Softmax function or LogSoftmax function.

Don’t worry about that now.

All you need to know is that Softmax is usually the last layer of a Deep Learning model of multi-class classification tasks.

The famous ImageNet dataset has 1000+ classes, so the output of Softmax has at least 1000+ components.

A collection of fully connected layers with ReLU activation in between some dropouts and at last, another fully connected linear layer which feeds into a Softmax activation is very typical of a vanilla Deep Learning Neural Network.

from collections import OrderedDictclassifier = nn.

Sequential(OrderedDict([ ('fc1', nn.

Linear(2048, 1024)), ('relu', nn.

ReLU()), ('dropout',nn.


2)), ('fc2', nn.

Linear(1024, 512)), ('relu', nn.

ReLU()), ('dropout',nn.


2)), ('fc3', nn.

Linear(512, 256)), ('relu', nn.

ReLU()), ('dropout',nn.


2)), ('fc4', nn.

Linear(256, 102)), ('output', nn.

LogSoftmax(dim=1)) ]))model.

classifier = classifierIn Pytorch it is easy to view the structure of your model just use print(model_name) .

More on that later.

Pytorch Training LoopThe training loop is perhaps the most characteristic of Pytorch as a deep learning framework.

In Sklearn can make it go away with fit and in Tensorflow with transform .

In Pytorch this part is much more involved.


train() tells your model that you are training the model.

So effectively layers like dropout, batchnorm etc.

which behave different on the train and test procedures know what is going on and hence can behave accordingly.

More details: It sets the mode to train (see source code).

You can call either model.

eval() or model.

train(mode=False) to tell that you are testing.

It is somewhat intuitive to expect trainfunction to train model but it does not do that.

It just sets the mode.


eval() will notify all your layers that you are in eval mode, that way, batchnorm or dropout layers will work in eval model instead of training mode.


no_grad() impacts the autograd engine and deactivate it.

It will reduce memory usage and speed up computations but you won’t be able to backprop (which you don’t want in an eval script).

 — albanDDeveloper ToolsTrain on more than one GPUTraining Loops:Train and ForwardFree GPU in the CloudThanks you new cloud technology like Google Colab.

Once you set up the notebook, you can continue train and monitor on your mobile devices!.Cloud choices including Google Colab, Kaggle, AWS.

Local choices including your own laptop and your gaming computer.

We repurporsed an msi NVIDIA GTX 1060 previously for Assassin’s Creed Origin 😀 If you want to know more let us know.

Be able to train on a GPU locally has been a major advantage.

We were able to iterate through parameter tuning combinations fast without interuption.

Google Colab has a 12-hour timeout as well as a 12GB quota limit.

If you are an advanced user, be sure to avoid constantly downloading the dataset, instead store it in your Google Drive.

After deleting your model in Google Drive, be sure to empty trash to actually delete it.

Regardless of where the model is trained, if the training loss has gone down a lot near zero, but the validation loss does not decrease (there’s no test dataset), you may want to watch out your model is overfitting and it may be memorizing training data.

Halt and tune your parameters.

Even if you achieve 99% accuracy your model may not generalize, hence it’s a possibility that it cannot be used else where.

This paragraph of information is especially relevant for Udacity students doing scholarship challenges and nanodegrees.

Be very suspicious of 99% accuracy, but do a brief dance to celebrate first.

Installation PytorchUsing Anaconda to install Pytorch is a great start across all systems including and Windows.

We were able to install Pytorch with Anaconda on a gaming computer and start to use its CUDA GPU feature right away.

Read our Anaconda Cheatsheet here.

conda install numpy jupyter notebookHello World in Pytorch is as easy as launching a Google Colab (yes, right on Google’s turf), and import torch , check out this shared view only notebook.

Modern hosted data science notebooks like Kaggle Kernel and Google Colab all come with Pytorch pre-intalled.

Look Ma: deep learning with no server!Prefer Jupyter Notebook based tutorials instead?.Getting started using the Udacity Intro to Pytorch repo, found at the bottom of this article.

Loading Data Using Train and Test Loaderstrain_loader = torch.



DataLoader(train_data, batch_size=batch_size, num_workers=num_workers)test_loader = torch.



DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)ForwardingPytorch forward pass will actually calculate y = wx + b before that we are just writing placeholders.

Useful Pytorch Libraries and Modules and Installations.

import torchimport torchvisionfrom torchvision import datasets, transforms, modelsimport torch.


functional as Ffrom collections import OrderedDictfrom torch import nnfrom torch import optimTransformationtransforms.

ToTensor() convert data array to Pytorch tensor.

Advantage include easy to use in CUDA, GPU training.

Pytorch Convolutional Neural Networks (CNN)This section is under construction… check back soon.

import torchimport numpy as npfrom torchvision import datasetsimport torchvision.

transforms as transformsMaxpooling layer discards detailed spatial information contained in the original image.

Inspect Pytorch ModelsFirst init the model.

vgg16 = models.

vgg16(pretrained=True)print(vgg16)In Pytorch, use print(model_name) to print out the model and architecture of the model.

You can easily see what the model is all about.

VGG( (features): Sequential( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU(inplace) (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)).

(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)).

(29): ReLU(inplace) (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (classifier): Sequential( (0): Linear(in_features=25088, out_features=4096, bias=True) (1): ReLU(inplace) (2): Dropout(p=0.

5) (3): Linear(in_features=4096, out_features=4096, bias=True) (4): ReLU(inplace) (5): Dropout(p=0.

5) (6): Linear(in_features=4096, out_features=1000, bias=True) ))Note that each layer is named (numbered and can be queried by this index).

We used ellipsis to omit model details so that the VGG model can fit on the screen.

Pro tip: inspecting the model architecture is a must.

Transfer learning, in a nutshell, is about modifying the last few classifier layers.

Read more in our transfer learning article.

Pro tip: for Tensorflow use keras model.

summary() to review the entire model architecture.

It even outputs number of parameters and dimensions.

Advanced FeaturesUsing CUDAThough we decided to put CUDA in the advanced section, but the reality is CUDA is so easy to use.

Just use it… today!.With Anaconda, Pytorch, and CUDA, we were able to turn a gaming computer with an NVDIA graphics card into a home deep learning powerhouse.

No configuration needed!.It just works.

The framework just works on a windows machine!.(It is an msi NVIDIA GTX 1060 previously for Assassin’s Creed Origin 😀 If you want to know more let us know.

)gpu_is_avail = torch.


is_available()if not gpu_is_avail: print('CUDA is NOT available.

')else: print('CUDA is available.

')One common error of using CUDA with Pytorch is not moving both the model and the data to CUDA.

And when needed move both of them back to CPU.

Generally your model and data should always live in the same space.

Deploying Pytorch in Production: There are two methods of turning existing Pytorch models to production ready deployment trace and scripting.

Tracing does not support complex models with control flow in the code.

Scripting supports Pytorch codes with control flow but supports only a limited number of Python modules.

Choosing the best Softmax result: in multi-class classification, the activation Softmax function is often used.

Pytorch has a dedicated function to extract top results — the most likely class from Softmax output.


topk(input, k, dim) returns the top probability.


topk(input, k, dim=None, largest=True, sorted=True, out=None) -> (Tensor, LongTensor)Returns the k largest elements of the given input tensor along a given dimension.

If dim is not given, the last dimension of the input is chosen.

If largest is False then the k smallest elements are returned.

Consume Pytorch Models on Other Platformsimport torch.

onnx import torchvision dummy_input = torch.

randn(1, 3, 224, 224) model = torchvision.


alexnet(pretrained=True) torch.


export(model, dummy_input, "alexnet.

onnx")“Export models in the standard ONNX (Open Neural Network Exchange) format for direct access to ONNX-compatible platforms, runtimes, visualizers, and more.

” — Pytorch 1.

0 DocumentationMore on Pytorch Transfer LearningTo use an existing model is equivalent to freeze some of its layers and parameters and not train those.

Turn off training autograd by setting require_grad to False.

for param in model.

parameters(): param.

requires_grad = FalseHyperparameter TuningIn addition to using the right optimizer and adjusting learning rate according.

You can use the learning rate scheduler to dynamically adjust your learning rate.

#define schedulerscheduler = lr_scheduler.

StepLR(optimizer, step_size=3, gamma=0.

1)Read more about scheduler here.

Model and Check Point SavingSave and Load Model CheckpointPro tip: Did you know you can save and load models locally and in google drive?.This way you don’t have to start from scratch every time.

For example, if you already trained 5 epochs.

You can save the weights and train another 5 epochs.

Now you did 10 epochs total!.Very convenient.

The free GPU resources time out and get erased very often.

Remember incremental training is possible.

You can also save a checkpoint and load it locally.

You may see both extension .

pt and .

pth .

# write and then use your custom load_checkpoint functionmodel = load_checkpoint('checkpoint_resnet50.

pth')print(model)# use pytorch torch.

load and load_state_dict(state_dict)checkpt = torch.



load_state_dict(checkpt)#save locally, map the new class_to_idx, move to cpu#note down model architecturecheckpoint['class_to_idx']model.

class_to_idx = image_datasets['train'].



save({'arch': 'resnet18', 'state_dict': model.

state_dict(), 'class_to_idx': model.

class_to_idx}, 'classifier.

pth')Further ReadingPytorch Data Science Nanodegree Deep Learning Intro to Pytorch notebooks and tutorials by Udacity.

Transfer Learning with Pytorch — Our super short effective articleAbout UsWe write beginner friendly cheatsheets like this all the time.

Follow our profile and our most popular Data Science Bootcamp publication.

Check out our one page article on Transfer Learning in Pytorch, Pytorch on Amazon SageMaker, and Anaconda Cheatsheet for Data Science.

We are primarily on Medium, a community we love and find strong affinity in.

Medium treats its writers well, and has a phenomenal reader community.

If you would like to find out about our upcoming Data Science Bootcamp course releasing Fall 2019, scholarship for high quality articles, or want to write for us, contribute feedback please email us hi@uniqtech.

co Thank you Medium community!Key author and contributor Sun @ __add your name here__!.

. More details

Leave a Reply