Introduction to Regularization to Reduce Overfitting of Deep Learning Neural Networks

Training a deep neural network that can generalize well to new data is a challenging problem.A model with too little capacity cannot learn the problem, whereas a model with too much capacity can learn it too well and overfit the training dataset..Both cases result in a model that does not generalize well.A modern approach to reducing generalization error is to use a larger model that may be required to use regularization during training that keeps the weights of the model small..These techniques not only reduce overfitting, but they can also lead to faster optimization of the model and better overall performance.In this post, you will discover the problem of overfitting when training neural networks and how it can be addressed with regularization methods.After reading this post, you will know:Let’s get started.A Gentle Introduction to Regularization to Reduce Overfitting and Improve Generalization ErrorPhoto by jaimilee.beale, some rights reserved.This tutorial is divided into four parts; they are:The objective of a neural network is to have a final model that performs well both on the data that we used to train it (e.g. the training dataset) and the new data on which the model will be used to make predictions.The central challenge in machine learning is that we must perform well on new, previously unseen inputs — not just those on which our model was trained..The ability to perform well on previously unobserved inputs is called generalization.— Page 110, Deep Learning, 2016.We require that the model learn from known examples and generalize from those known examples to new examples in the future..We use methods like a train/test split or k-fold cross-validation only to estimate the ability of the model to generalize to new data.Learning and also generalizing to new cases is hard.Too little learning and the model will perform poorly on the training dataset and on new data..The model will underfit the problem..Too much learning and the model will perform well on the training dataset and poorly on new data, the model will overfit the problem..In both cases, the model has not generalized.Learning CurvesA model fit can be considered in the context of the bias-variance trade-off.An underfit model has high bias and low variance..Regardless of the specific samples in the training data, it cannot learn the problem..An overfit model has low bias and high variance..The model learns the training data too well and performance varies widely with new unseen examples or even statistical noise added to examples in the training dataset.In order to generalize well, a system needs to be sufficiently powerful to approximate the target function..If it is too simple to fit even the training data then generalization to new data is also likely to be poor.. More details

Leave a Reply