A Guide for Building Convolutional Neural Networks

Before that, you’re just experimenting and prototyping and so there’s no need to make your training time longer by having more data.Lessons LearnedPreprocess only when needed, based on your task and using proven research as a guideAugmentation almost always increases accuracy, just make sure you use it in reflection with the data you realistically expect to see in your applicationRegularisationRegularisation can be used whenever you feel that you are overfitting to your training data and performing poorly on testing..You can tell when you are overfitting when you see that the difference between your training and testing accuracies is quite large, with your training accuracy being much better than test.There are several options to choose from: dropout, spatial dropout, cutout, L1, L2, adding Gaussian noise…..and many more in the sea of research papers!.Practically, dropout is the easiest to use since you usually only have to put it in a couple of places and tune a single parameter..You can start by placing it just prior to the last couple of dense layers in your network..If you feel like you’re still overfitting, you can add more earlier in the network or play around with the dropout probability..This should close the gap between your training and testing accuracies.If regular dropout fails, you can play around with the others..With things like L1 and L2, you have more tuning options and so might be able to calibrate them to do a better regularisation job than dropout..In the vast majority of cases you won’t need to combine more than 1 regularisation technique i.e try to only use 1 throughout your network.Lesson LearnedUse dropout by default for practicality and ease of useIf dropout fails, explore some of the others which can be customised like L1 / L2If all techniques fail, you may have a mismatch between your training and testing dataTrainingWhen you finally want to train your network, there are several optimization algorithms to choose from..Many people say that SGD gets you the best results with regards to accuracy, which in my experience is true..However, tuning the learning rate schedule and parameters can be challenging and tedious..On the other hand, using an adaptive learning rate such as Adam, Adagrad, or Adadelta is quick and easy, but you might not get that optimal accuracy of SGD.The best thing to do here is to follow the same “style” as the activation functions: go with the easy ones first to see if your design works well, then tune and optimize using something more complex.. More details

Leave a Reply