Will dropout regularization prevents your model to overfit?

We will dive in into the implementation of dropouts and prove if it will prevent overfitting.900 images of clothing (greyscale) Fashion-MNISTThis project is inspired by:Facebook Udacity PyTorch Challenge.First, we will create a Neural Network without the regularization implementation, and our hypothesis is that we can deduct that over time our model will perform badly in the validation-set because the more we train our model with the training-set the better it gets by classifying specific characteristic of the testing data, thus creating bad generalization model for inference.Let's import the Fashion-MNIST datasetLet’s download the dataset using torchvision, typically we separate 20% of the dataset for the validation set..We can use torch.mean to calculate the mean however we need to convert the equals to a FloatTensor.accuracy = torch.mean(equals.type(torch.FloatTensor))# Print the accuracyprint(f'Accuracy: {accuracy.item()*100}%')Train our modelSince we want our loss function to behave oppositely to our Logarithm Softmax function we will use the Negative Log Likelihood to calculate our loss.from torch import optim# Instantiate the modelmodel = FashionNeuralNetwork()# Use Negative Log Likelyhood as our loss functionloss_function = nn.NLLLoss()# Use ADAM optimizer to utilize momentumoptimizer = optim.Adam(model.parameters(), lr=0.003)# Train the model 30 cyclesepochs = 30# Initialize two empty arrays to hold the train and test lossestrain_losses, test_losses = [],[]# Start the trainingfor i in range(epochs): running_loss = 0 # Loop through all of the train set forward and back propagate for images,labels in trainloader: optimizer.zero_grad() log_ps = model(images) loss = loss_function(log_ps, labels) loss.backward() # Backpropagate optimizer.step() running_loss += loss.item() # Initialize test loss and accuracy to be 0 test_loss = 0 accuracy = 0 # Turn off the gradients with torch.no_grad(): # Loop through all of the validation set for images, labels in testloader: log_ps = model(images) ps = torch.exp(log_ps) test_loss += loss_function(log_ps, labels) top_p, top_class = ps.topk(1,dim=1) equals = top_class == labels.view(*top_class.shape) accuracy += torch.mean(equals.type(torch.FloatTensor)) # Append the average losses to the array for plotting train_losses.append(running_loss/len(trainloader)) test_losses.append(test_loss/len(testloader))Print out the model:The lowest validation loss will be at epoch 5 with an accuracy of 87%.This proves our hypothesis that overtime our model will train better but not in generalizing images outside the training dataset..This means that the model does not do a good job in classifying images outside the training dataset..This is really bad, this means that our model learns the only the specific of our training dataset, which becomes so specialized that it might only recognize images from the training set..Now let’s train this model!from torch import optim# Instantiate the modelmodel = FashionNeuralNetworkDropout()# Use Negative Log Likelyhood as our loss functionloss_function = nn.NLLLoss()# Use ADAM optimizer to utilize momentumoptimizer = optim.Adam(model.parameters(), lr=0.003)# Train the model 30 cyclesepochs = 30# Initialize two empty arrays to hold the train and test lossestrain_losses, test_losses = [],[]# Start the trainingfor i in range(epochs): running_loss = 0# Loop through all of the train set forward and back propagate for images,labels in trainloader: optimizer.zero_grad() log_ps = model(images) loss = loss_function(log_ps, labels) loss.backward() # Backpropagate optimizer.step() running_loss += loss.item() # Initialize test loss and accuracy to be 0 test_loss = 0 accuracy = 0 # Turn off the gradients with torch.no_grad(): # Turn on Evaluation mode model.eval() # Loop through all of the validation set for images, labels in testloader: log_ps = model(images) ps = torch.exp(log_ps) test_loss += loss_function(log_ps, labels) top_p, top_class = ps.topk(1,dim=1) equals = top_class == labels.view(*top_class.shape) accuracy += torch.mean(equals.type(torch.FloatTensor)) # Turn on Training mode again model.train() # Append the average losses to the array for plotting train_losses.append(running_loss/len(trainloader)) test_losses.append(test_loss/len(testloader))Print the result:Accuracy increases over time and model is not overfittingThe target here is to have validation loss as low as our training loss, this means that our model is fairly accurate..Let’s plot the graph and see the difference:# Plot the graph here%matplotlib inline%config InlineBackend.figure_format = 'retina'import matplotlib.pyplot as pltplt.plot(train_losses, label='Training Loss')plt.plot(test_losses, label='Validation Loss')plt.legend(frameon=True)Overfitting is gone!InferenceNow that our model can generalize better, let’s try to feed our model with an image outside the training data set and visualize the classification of our model.# Make sure to make our model in the evaluation modemodel.eval()# Get the next image and labelimages, labels = next(iter(testloader))img = images[0]# Convert 2D image to 1D vectorimg = img.view(1, 784)# Calculate the class probabilities (log-softmax) for imgwith torch.no_grad(): output = model.forward(img)# Normalize the outputps = torch.exp(output)# Plot the image and probabilitieshelper.view_classify(img.view(1, 28, 28), ps, version='Fashion')Awesome!ConclusionThis is great! We can see the significant balance between the training loss and validation loss. It’s safe to say that if we train the model for more cycles and fine-tune our hyperparameters, the validation loss will decrease. We can see from the graph above that our model generalize better over time, the model achieves better accuracy after 6–8 epochs and it’s safe to say that the model prevents overfitting by implementing dropouts to the model.Thank you so much for your time, and please check out this repository for the full code!This is my Portfolio and Linked-In profile :). More details

Leave a Reply