How to Get Better Deep Learning Results (7-Day Mini-Course)

Note that we will be looking some examples from two of the three areas as part of this mini-course.

Post your answer in the comments below.

I would love to see what you discover.

In the next lesson, you will discover how to control the speed of learning with the batch size.

In this lesson, you will discover the importance of the batch size when training neural networks.

Neural networks are trained using gradient descent where the estimate of the error used to update the weights is calculated based on a subset of the training dataset.

The number of examples from the training dataset used in the estimate of the error gradient is called the batch size and is an important hyperparameter that influences the dynamics of the learning algorithm.

The choice of batch size controls how quickly the algorithm learns, for example:Keras allows you to configure the batch size via the batch_size argument to the fit() function, for example:The example below demonstrates a Multilayer Perceptron with batch gradient descent on a binary classification problem.

For this lesson, you must run the code example with each type of gradient descent (batch, minibatch, and stochastic) and describe the effect that it has on the learning curves during training.

Post your answer in the comments below.

I would love to see what you discover.

In the next lesson, you will discover how to fine tune a model during training with a learning rate scheduleIn this lesson, you will discover how to configure an adaptive learning rate schedule to fine tune the model during the training run.

The amount of change to the model during each step of this search process, or the step size, is called the “learning rate” and provides perhaps the most important hyperparameter to tune for your neural network in order to achieve good performance on your problem.

Configuring a fixed learning rate is very challenging and requires careful experimentation.

An alternative to using a fixed learning rate is to instead vary the learning rate over the training process.

Keras provides the ReduceLROnPlateau learning rate schedule that will adjust the learning rate when a plateau in model performance is detected, e.

g.

no change for a given number of training epochs.

For example:This callback is designed to reduce the learning rate after the model stops improving with the hope of fine-tuning model weights during training.

The example below demonstrates a Multilayer Perceptron with a learning rate schedule on a binary classification problem, where the learning rate will be reduced by an order of magnitude if no change is detected in validation loss over 5 training epochs.

For this lesson, you must run the code example with and without the learning rate schedule and describe the effect that the learning rate schedule has on the learning curves during training.

Post your answer in the comments below.

I would love to see what you discover.

In the next lesson, you will discover how you can accelerate the training process with batch normalizationIn this lesson, you will discover how to accelerate the training process of your deep learning neural network using batch normalization.

Batch normalization, or batchnorm for short, is proposed as a technique to help coordinate the update of multiple layers in the model.

The authors of the paper introducing batch normalization refer to change in the distribution of inputs during training as “internal covariate shift“.

Batch normalization was designed to counter the internal covariate shift by scaling the output of the previous layer, specifically by standardizing the activations of each input variable per mini-batch, such as the activations of a node from the previous layer.

Keras supports Batch Normalization via a separate BatchNormalization layer that can be added between the hidden layers of your model.

For example:The example below demonstrates a Multilayer Perceptron model with batch normalization on a binary classification problem.

For this lesson, you must run the code example with and without batch normalization and describe the effect that batch normalization has on the learning curves during training.

Post your answer in the comments below.

I would love to see what you discover.

In the next lesson, you will discover how to reduce overfitting using weight regularization.

In this lesson, you will discover how to reduce overfitting of your deep learning neural network using weight regularization.

A model with large weights is more complex than a model with smaller weights.

It is a sign of a network that may be overly specialized to training data.

The learning algorithm can be updated to encourage the network toward using small weights.

One way to do this is to change the calculation of loss used in the optimization of the network to also consider the size of the weights.

This is called weight regularization or weight decay.

Keras supports weight regularization via the kernel_regularizer argument on a layer, which can be configured to use the L1 or L2 vector norm, for example:The example below demonstrates a Multilayer Perceptron model with weight decay on a binary classification problem.

For this lesson, you must run the code example with and without weight regularization and describe the effect that it has on the learning curves during training.

Post your answer in the comments below.

I would love to see what you discover.

In the next lesson, you will discover how to reduce overfitting by adding noise to your modelIn this lesson, you will discover that adding noise to a neural network during training can improve the robustness of the network, resulting in better generalization and faster learning.

Training a neural network with a small dataset can cause the network to memorize all training examples, in turn leading to poor performance on a holdout dataset.

One approach to making the input space smoother and easier to learn is to add noise to inputs during training.

The addition of noise during the training of a neural network model has a regularization effect and, in turn, improves the robustness of the model.

Noise can be added to your model in Keras via the GaussianNoise layer.

For example:Noise can be added to a model at the input layer or between hidden layers.

The example below demonstrates a Multilayer Perceptron model with added noise between the hidden layers on a binary classification problem.

For this lesson, you must run the code example with and without the addition of noise and describe the effect that it has on the learning curves during training.

Post your answer in the comments below.

I would love to see what you discover.

In the next lesson, you will discover how to reduce overfitting using early stopping.

In this lesson, you will discover that stopping the training of a neural network early before it has overfit the training dataset can reduce overfitting and improve the generalization of deep neural networks.

A major challenge in training neural networks is how long to train them.

Too little training will mean that the model will underfit the train and the test sets.

Too much training will mean that the model will overfit the training dataset and have poor performance on the test set.

A compromise is to train on the training dataset but to stop training at the point when performance on a validation dataset starts to degrade.

This simple, effective, and widely used approach to training neural networks is called early stopping.

Keras supports early stopping via the EarlyStopping callback that allows you to specify the metric to monitor during training.

The example below demonstrates a Multilayer Perceptron with early stopping on a binary classification problem that will stop when the validation loss has not improved for 200 training epochs.

For this lesson, you must run the code example with and without early stopping and describe the effect it has on the learning curves during training.

Post your answer in the comments below.

I would love to see what you discover.

This was your final lesson.

You made it.

Well done!Take a moment and look back at how far you have come.

You discovered:This is just the beginning of your journey with deep learning performance improvement.

Keep practicing and developing your skills.

Take the next step and check out my book on getting better performance with deep learning.

How did you do with the mini-course?.Did you enjoy this crash course?Do you have any questions?.Were there any sticking points?.Let me know.

Leave a comment below.

…with just a few lines of python codeDiscover how in my new Ebook: Better Deep LearningIt provides self-study tutorials on topics like: weight decay, batch normalization, dropout, model stacking and much more…Skip the Academics.

Just Results.

Click to learn more.

.

. More details

Leave a Reply