What’s the fuss about Regularization?

Thus Lasso is useful in feature selection wherein it eliminates the unnecessary features by reducing their weights to zero and achieves a more parsimonious model.Let us now look at the how the RSS(MSE) varies for these two regularization's on training data.Fig 9- MSE(train) for Lasso & Ridge as function of regularizationLook how the MSE on training data is higher for the given lambdas for Lasso than Ridge..This is because Lasso is regularizing the model more than Ridge.It is worth to look at how the MSE varies for on a cross validation data with 3 fold cross validation.Fig 10- Cross Validation score for Lasso & Ridge as function of regularizationLook how the cross validation score drops fast for Lasso than Ridge indicating that the Lasso fits better on validation data.Finally we need to remember one important aspect of regularization that the data has to be standardized.Similar to ridge ,lasso also shrinks the coefficients estimates to zero but in case of lasso the l1 penalty has effect of forcing some coefficients to zero when lambda is large..Thus lasso results in variable selection and lasso models are also sometimes called sparse models..However when the response is a function of all the predictors ridge is efficient as it does not eliminate the features.Regularization is not only for regression but also used in decision trees where it is called pruning, in neural networks it is known as drop outs..But the discussion about those is beyond scope of this article.I hope the above article helps you in gaining understanding on regularization.If you want to see a running example please check it out on google colab hereGoogle ColaboratoryEdit descriptioncolab.research.google.comThe github code can be pulled from here. More details

Leave a Reply