Impact of Dataset Size on Deep Learning Model Skill And Performance Estimates

We will define a model that performs well on this dataset as a model that has effectively learned the two circles problem.We can then experiment by fitting models with different sized training datasets and evaluate their performance on the test set.Too few examples will result in a low test accuracy, perhaps because the chosen model overfits the training set or the training set is not sufficiently representative of the problem.Too many examples will result in good, but perhaps slightly lower than ideal test accuracy, perhaps because the chosen model does not have the capacity to learn the nuance of such a large training dataset, or the dataset is over-representative of the problem.A line plot of training dataset size to model test accuracy should show an increasing trend to a point of diminishing returns and perhaps even a final slight drop in performance.We can use the create_dataset() function defined in the previous section to create train and test datasets and set a default for the size of the test set argument to be 100,000 examples while allowing the size of the training set to be specified and vary with each call..Importantly, we want to use the same test set for each different sized training dataset.We can directly use the same evaluate_model() function from the previous section to fit and evaluate an MLP model on a given train and test set.We can create a new function to perform the repeated evaluation of a given model to account for the stochastic learning algorithm.The evaluate_size() function below takes the size of the training set as an argument, as well as the number of repeats, that defaults to five to keep running time down..The create_dataset() function is created to create the train and test sets, and then the evaluate_model() function is called repeatedly to evaluate the model..The function then returns a list of scores for the repeats.This function can then be called repeatedly..I would guess that somewhere between 1,000 and 10,000 examples of the problem would be sufficient to learn the problem, where sufficient means only small fractional differences in test accuracy.Therefore, we will investigate training set sizes of 100, 1,000, 5,000, and 10,000 examples..The mean test accuracy will be reported for each test size to give an idea of progress.At the end of the run, a line plot will be created to show the relationship between train set size and model test set accuracy..We would expect to see an exponential curve from poor accuracy to a point of diminishing returns.Box and whisker plots of the score distributions for each test set size are also created..We would expect to see a shrinking in the spread of test accuracy as the size of the training set is increased.The complete example is listed below.Running the example may take a few minutes on modern hardware.The mean model performance is reported for each training set size, showing a steady improvement in test accuracy as the training set is increased, as we expect..We also see a small drop in the average model performance from 5,000 to 10,000 examples, very likely highlighting that the variance in the data sample has exceeded the capacity of the chosen model configuration (number of layers and nodes).A line plot of test accuracy vs training set size is created.We can see a sharp increase in test accuracy from 100 to 1,000 examples, after which performance appears to level off.Line Plot of Training Set Size vs Test Set Accuracy for an MLP Model on the Circles ProblemA box and whisker plot is created showing the distribution of test accuracy scores for each sized training dataset..As expected, we can see that the spread of test set accuracy scores shrinks dramatically as the training set size is increased, although remains small on the plot given the chosen scale.Box and Whisker Plots of Test Set Accuracy of MLPs Trained With Different Sized Training Sets on the Circles ProblemThe results suggest that the chosen MLP model configuration can learn the problem reasonably well with 1,000 examples, with quite modest improvements seen with 5,000 and 10,000 examples..Perhaps there is a sweet spot of 2,500 examples that results in an 84% test set accuracy with less than 5,000 examples.The performance of neural networks can continually improve as more and more data is provided to the model, but the capacity of the model must be adjusted to support the increases in data..Eventually, there will be a point of diminishing returns where more data will not provide more insight into how to best model the mapping problem.. More details

Leave a Reply