Detecting academics’ major from facial images

It takes mini-batches of (128, 128, 3)-shaped tensors (128 x 128 pixels, RGB) as input and predicts probabilities for each of our four target classes.The architecture of the combined model looks like this:_________________________________________________________________Layer (type) Output Shape Param # =================================================================vggface_vgg16 (Model) (None, 512) 14714688 _________________________________________________________________top (Sequential) (None, 4) 66180 =================================================================Total params: 14,780,868Trainable params: 2,425,988Non-trainable params: 12,354,880top is the model described in step 1, vggface_vgg16 is a VGG16 model and looks like this:_________________________________________________________________Layer (type) Output Shape Param # =================================================================input_3 (InputLayer) (None, 128, 128, 3) 0 _________________________________________________________________conv1_1 (Conv2D) (None, 128, 128, 64) 1792 _________________________________________________________________conv1_2 (Conv2D) (None, 128, 128, 64) 36928 _________________________________________________________________pool1 (MaxPooling2D) (None, 64, 64, 64) 0 _________________________________________________________________conv2_1 (Conv2D) (None, 64, 64, 128) 73856 _________________________________________________________________conv2_2 (Conv2D) (None, 64, 64, 128) 147584 _________________________________________________________________pool2 (MaxPooling2D) (None, 32, 32, 128) 0 _________________________________________________________________conv3_1 (Conv2D) (None, 32, 32, 256) 295168 _________________________________________________________________conv3_2 (Conv2D) (None, 32, 32, 256) 590080 _________________________________________________________________conv3_3 (Conv2D) (None, 32, 32, 256) 590080 _________________________________________________________________pool3 (MaxPooling2D) (None, 16, 16, 256) 0 _________________________________________________________________conv4_1 (Conv2D) (None, 16, 16, 512) 1180160 _________________________________________________________________conv4_2 (Conv2D) (None, 16, 16, 512) 2359808 _________________________________________________________________conv4_3 (Conv2D) (None, 16, 16, 512) 2359808 _________________________________________________________________pool4 (MaxPooling2D) (None, 8, 8, 512) 0 _________________________________________________________________conv5_1 (Conv2D) (None, 8, 8, 512) 2359808 _________________________________________________________________conv5_2 (Conv2D) (None, 8, 8, 512) 2359808 _________________________________________________________________conv5_3 (Conv2D) (None, 8, 8, 512) 2359808 _________________________________________________________________pool5 (MaxPooling2D) (None, 4, 4, 512) 0 _________________________________________________________________global_max_pooling2d_3 (Glob (None, 512) 0 =================================================================Total params: 14,714,688Trainable params: 2,359,808Non-trainable params: 12,354,880I was using Keras ImageDataGenerator again for loading the data, augmenting (3x) and resizing it..As recommended, stochastic gradient descent is used with a small learning rate (10^-4) to carefully adapt weights..The model was trained for 100 epochs on batches of 32 images and, again, used categorical cross entropy as a loss function.Results (54.6 % accuracy)The maximum validation accuracy of 0.64 was reached after 38 epochs already..Test accuracy turned out to be 0.546, which is a quite disappointing result, considering that even our simple, custom CNN-model achieved a higher accuracy..Maybe the model’s complexity is too high for the small amount of training data?Inspecting the modelTo get better insights on how the model performs, I briefly inspected it with regards to several criteria..This is a short summary of my findings.Codeinspection.ipynbClass distributionThe first thing I looked at was the class distribution..How are the four study major subjects represented in our data and what does the model predict?Apparently, the model neglects the class of german linguists a bit..That is also the class for which we have the least training data..Probably I should collect more.Examples of false classificationsI wanted to get an idea of what the model does wrong and what it does right..Consequently, I took a look at the top (with respect to confidence) five (1) false negatives, (2) false positives and (3) true positives.Here is an excerpt for class econ:The top row shows examples of economists, who the model didn’t recognize as such.The center row depicts examples of what the model “thinks” economists look like, but who are actually students / researchers with a different major.Finally, the bottom row shows examples of good matches, i.e..people for whom the model had a very high confidence for their actual class.Again, if you are in one of these pictures and want to get removed, please contact me.Confusion matrixTo see which profession the model is unsure about, I calculated the confusion matrix.array([[12.76595745, 5.95744681, 0. , 6.38297872],[ 3.40425532, 12.76595745, 3.82978723, 8.08510638],[ 3.82978723, 5.53191489, 8.5106383 , 3.40425532],[ 5.95744681, 5.10638298, 1.27659574, 13.19148936]])Legend:0 = cs, 1 = econ, 2 = german, 3 = mechanicalBrighter colors ~ higher valueWhat we can read from the confusion matrix is that, for instance, the model tends to classify economists as mechanical engineers quite often.ConclusionFirst of all, this is not a scientific study, but rather a small hobby project of mine..Also, it does not have a lot of real-world importance, since one might rarely want to classify students into four categories.Although the results are not spectacular, I am still quite happy about them and at least my model was able to do a lot better than random guessing.. More details

Leave a Reply