Logistic Regression For Facial Recognition

Maybe we can make our model better.Class balance for original dataWe can see here that our data is a little out of wack — about 79% is classified as 1s (i.e. pixels containing skin) and only about 21% as 0s (i.e. pixels not containing skin).So, before we call it a day, let’s run something called SMOTE (Synthetic Minority Oversampling)..SMOTE creates synthetic data to fill more values in for our minority class (in our case this means it will give us more 0-data points)..Although creating synthetic data seems like cheating, it’s very common practice in the data-science world where datasets are not very often well-balanced.Running SMOTE on our original data and re-generating our confusion matrix and our Classification Report, we get the following.SMOTE confusion matrixClassification Report for SMOTE-ized data, showing both training and test readoutsWe can see that these numbers are significantly different than our original numbers..Recall that the first time around we had a precision of 96%, a recall of 92%, and a f1-score of 94%..After running our data through SMOTE, we have a precision of 85%, a recall of 92%, and a f1-score of 89%..So, our model actually got worse after trying to compensate for class imbalance.As data scientists, it’s up to us when to run things like SMOTE on our data or just leave our data as-is..In this case, we can see that leaving our data as-is was the way to go.So, What Does This All Mean?Besides being a good learning opportunity, we really did create a good model..And we know that doing other things to our data, such as trying to compensate for having a class imbalance, is definitively not what we should do in this case..Our original model was our best performer, correctly predicting an image having skin in it 94% of the time (using our f1-score).While this might sound like a good classification rate, there are myriad ethical issues that come with creating algorithms used for facial recognition..What if our model was going to be used to identify wanted terrorists in a crowd?.This is certainly different than recognizing a face for a filter in Snapchat.From Christoph Auer-Welsbach’s post “The Ethics of AI: Building technology that benefits people and society”This is where we as data scientists need as much information as possible when creating algorithms..If our model were to be used for Snapchat filters, we’d likely want to optimize towards upping our recall score — it’s ostensibly better that we mistakenly identify things as faces when they’re not faces than only recognize real faces some of the time.On the other hand, if our model were to be used for spotting wanted criminals in a crowd, we’d likely still want to optimize towards recall, but the consequences would be vastly different.. More details

Leave a Reply