Deep Learning in practice

Instead, there are a few practical tips which will help you assess whether you’ve been lucky, the problem is simple, or you made some glaring mistakes.Some of these tips only make sense for classifiers, while others can be applied to all kinds of Deep Learning model.In the following we will assume that you already followed best practices to avoid overfitting, i.e., you either used k−fold Cross-Validation (CV) or a training/validation/test set split to estimate your model’s generalization error, and you found a suspiciously low value.Check for data leakingFirst of all, check that all data transforms (or estimators, in Tensorflow lingo), are fit on the training set, and applied to the test set, for each fold of k−fold CV..For example, if dropping “useless” features is part of your model, you must not choose the “useful” features on the full training set and then always use the same features for each fold..On the contrary, you must repeat the feature selection operation for each fold, which means that in each fold you may be using different features to make your predictions..If using training/validation/test set split instead of k−fold CV, the equivalent approach is to use only the validation set to choose the useful features, without ever peeking at the test set. If you’re coding in Python, a simple way to make sure that the same exact operations are repeated for each fold is to use the `Pipeline` class (in `scikit-learn`) or the `Estimator` class (in Tensorflow), and to make sure that all operations on data are performed inside the pipeline/estimator.Are you using the right metric?If building a classifier, verify that you’re using the right metric for classification..For example, accuracy makes sense as a classification metric *only* if the various classes in the population are reasonably balanced..But on a medical data set, where the incidence of the disease in the population is 0.1%, you can get 99.9% accuracy on a representative data set, just by classifying each point to the majority class..In this case, 99.9% accuracy is nothing to be surprised of, because accuracy is not the right metric to use here.Is the problem really simple?Again, in the case of classification, check that the problem is not overly simple..For example, you could have perfectly separable data..In other words, there exists a manifold (or the intersection of multiple manifolds) in input space which separates the various classes perfectly (no noise)..In the case of a two-class linearly separable problem, there is an hyperplane in the input space which perfectly separates the two classes..This is easily diagnosed since all lower-dimensional projections of an hyperplane are hyperplanes..Thus for example in the 2D scatterplots for all pairs of features, the two classes can be perfectly separated by a line:Of course, since Deep Learning is often applied to problems with K (the number of features) in the order of thousands or ten of of thousands (e.g., image pixels), it may not be even possible to visualize a small fraction of all the (K+1)K/2 scatterplots.. More details

Leave a Reply