Is it worth it?Frustratingly, we did some testing of this idea last year and failed to get the approach working.
It turns out we had a programming error.
This time around we tried again, but more carefully.
Be prepared for a negative result.
Now here we go…Growing NN MetricsFaster does not mean better.
For example, we know that Adam trains faster than SGD, but SGD works better on test data than Adam (SOURCE).
Similarly, in this work we examine the implications of growing a neural network on standard testing data performance.
We used standard metrics (accuracy, precision, recall, f1-score and loss curve) to provide a comprehensive evaluation and comparison of the DNNs.
Growing NN DatasetWe used a Colab notebook for running our models (our motivation for using this platform was to accelerate model training by use of GPU option for hardware acceleration).
We imported the covertype dataset from the UCI machine learning repository into a dataframe using an imported panda library.
The dataset is comprised of 581012 data points of unscaled quantitative and binary data in 54 categories (features) related to forest land characteristics.
The classification task is to predict forest cover type of four wilderness areas located in the Roosevelt National Forest of northern Colorado for a given observation (30 x 30 meter cell).
Forest Cover Type designation is provided as an integer in the range 1–7.
Growing NN ProcedureAfter data ingestion, we split the dataset into “state” information (features) and classes.
We scaled all the features into the range 0–1 using MinMaxScaler (imported from sklearn.
preprocessing) and one-hot encoded the classes.
A quick check of the data dimensionality shows that x (features) has the shape of (581012, 54) and y (classes) has the shape of (581012, 8).
We divided the data into a test train split of 20/80 meaning that the model will train on 80% of the data and test/validate it’s predictions on the remaining 20%.
K-fold cross validation is used to randomly split the data differently in 3 ways, this ensures that the train test split occurs in numerous combinations across the whole data-set eliminating problems related to unevenly sampled and biased data.
Growing NN Model PreparationThe classification task was implemented using 3 different models which differed only by layer architecture (number, width and trainable layers).
We used “categorical_crossentropy” and “adam” respectively as the model loss and optimizer.
Model performance was measured using the validation dataset ( data the model was not trained on), specific metrics measured were loss, accuracy, precision, recall, and f score (see here and here for a detailed explanation of the meaning of these metrics and how to calculate them).
Growing NN Base Model 1The initial model is comprised of 3 layers.
The first layer takes in the shape of the x feature (54,), data is then passed to a fully connected second layer with width of 30.
A dropout layer (0.
5) is added followed by another layer of width 25.
This is followed by another dropout layer (0.
5) which is connected to a final output layer with width 8.
We use the ‘relu’ activation function for the input and hidden layers, a softmax function is applied to the final output layer.
Training is set for 10 epochs with a batch size of 128.
Growing NN Model 2 Incorporating Frozen Base Model 1The second model is comprised of the first model with an additional 2 layers added to the input side of the network.
The trained first model is loaded from an HDF5 extension file.
We “freeze” the loaded first model by setting the trainable attribute of the previous model to false.
The additional layers are set to width of 100 and 54 respectively activation functions are set to relu.
(The first layer is chosen as a somewhat arbitrary but reasonable increase of model width.
The second layer is chosen in order to interface correctly with the dimension of the input layer of the frozen layer.
)Growing NN Model 3Finally we generate a third model which is equivalent in size and design to the combined model 1 and model 2 DNN.
Model 3 consists of fully trainable layers.
Performance metrics are evaluated for all 3 DNNs to see if there is an advantage to reusing and freezing a pretrained model as part of a larger model.
Growing NN Observations and DiscussionResults are shown in the series of figures below:Figure 1: Bar Chart of K-fold mean accuracy with standard deviation error barsResults of these experiments are very interesting, if somewhat inconclusive.
Our initial null hypothesis is that using pre-trained models as frozen layers in a larger model does not provide any advantage as compared to training the larger model in a single step.
Figure 1 shows that all models achieved a similar level of mean accuracy.
Model 2 appears to have the highest mean accuracy, however the size of the error bars indicate that differences between the models are unlikely to be statistically significant.
Looking at Figures 2 and 3, the second model is shown to have the best performance for recall, accuracy and f-score.
However, the graph results are only shown for a single a k-fold iteration and as discussed earlier, may not be statistically significant (all the bar chart values are around the same height in Fig 1, so take it with a grain of salt).
In any case, we can see that the results shown above did not disprove the potential utility of using frozen pretrained models, although any advantage is likely to be small and may not warrant the effort required for implementation of this idea.
It looks like there might be something there, but if there is, it’s really small.
I’m going to call that a negative result.
More research is needed to see if growing DNNs is a good idea, but so far, no luck.
ConclusionIn this article you followed along to see some negative results and extreme frustration in the day to day life of an artificial intelligence researcher.
Face embedding and growing neural networks are definitely real, and definitely hard.
It’s science, and science is messy!If you liked this article, then have a look at some of my most read past articles, like “How to Price an AI Project” and “How to Hire an AI Consultant.
” And hey, join our newsletter!Until next time!-DanielLemay.
aidaniel@lemay.
ai.