The Data Driven Partier: Movie Mustache

As always, the code seen here can be viewed in its entirety on my GitHub.

The placement of this mustache was optimized for comedic effectDue to the availability of facial recognition packages, this problem can be solved in less than a day with the right approach.

To address my problem statement as efficiently as possible, I broke it down into a few short steps:Find a pre-trained neural network that finds facesUsing facial coordinates returned by the model, estimate where the mustache should goObtain frequent mustache locations using movie dataVisualize these locations sensiblyThe red dot marks where MTCNN suggests a mustache should goThe model I used involves Multi-task Cascaded Convolutional Neural Networks and is called MTCNN.

I want to include a special thanks to this post for including a lot of the syntax associated with the package, as it all but trivialized this endeavor.

MTCNN very efficiently locates facial landmarks and returns five of the more useful ones: left and right eyes, left and right corners of the mouth, and nose.

Testing this model with a picture of myself, we can see that it doesn’t detect these facial landmarks perfectly, but it will be good enough for the purposes of this project.

In the picture to the left, I used MTCNN to draw a bounding box on my face along with blue dots where the model output my nose, right mouth corner, and left mouth corner to be.

I marked the centroid of those three points in red to estimate where a mustache should go and was very pleasantly surprised by the results.

Now that we have a proof of concept using MTCNN on a picture of my face, it became time to scale up the process.

Using the code below, I had MTCNN watch Indiana Jones and the Raiders of the Lost Ark to find where the mustaches should go.

#get_mustache takes the output from MTCNN and returns the #coordinates of the nose and mouth-corners centroid.

detector = MTCNN()mustaches = []for file in os.

listdir(‘data/indiana_jones1/’): image = cv2.

imread(f’data/indiana_jones1/{file}’) result = detector.

detect_faces(image) for face in result: mustache_location = get_mustache(face) mustaches.

append(mustache_location)I now had a list of (~9000) tuples of mustache locations associated with the first Indiana Jones film, and it came time to visualize the results.

Because a list of tuples isn’t compatible with the visualization solution I had in mind, there was some slight cleaning that had to be done:x = []y = []for location in mustaches: x.

append(location[0]/2880) y.

append((1800-location[1])/1880)The y-values I used were 1800 minus the y-coordinate because (0, 0) represented the top left of an image, but I needed (0, 0) to represent the bottom left of my representations and my images were 2880×1800.

I then scaled them down to decimals so that the coordinate values would become fractions usable on any TV or screen.

A kde plot of frequent mustache coordinatesThe representations I used to maximize drinking in this game were a kernel density estimator (kde) and a 2D hexplot.

The kde plot is like a topographical map showing the density of occurrences of whatever is measured.

In this case, it shows the area with the highest probability of a mustache occurring.

On the axes, there is a smoothed distribution of the x and y coordinates of the mustaches separately.

These axes are scaled to the screen, so a value of (0.

5, 0.

5) would be a point at the center of the screen.

I like this visualization a lot for its simplicity and ease of interpretation, but it has its drawbacks.

First and foremost, it only shows one area of high density, so if we were to play the game with more than one mustache, this visualization is clearly limited.

The other main flaw of this visualization is how vague it is.

The topographical-style layers are nice to see convergence on a point, but the darkest layer is disappointingly large.

With a different visualization, we should be able to see exactly where the mustache(s) should go.

Hexplot of frequent mustache locationsWith a hexplot, both of the drawbacks of the kde plot are addressed.

The hexplot shown here acts as a 2D histogram, showing exactly where the mustaches occur most frequently.

The x and y axes on this plot show the unsmoothed histogram represented on the kde plot’s axes, giving a more precise representation of our data in each dimension.

The hexplot also shows each location where a mustache will line up with a face, meaning that if we don’t want to maximize the drinks taken, we can still place our mustaches in sensible places where they will occasionally line up on faces.

The methods described here can help make a night of Movie Mustache more aggressive if the rules are to drink while the mustaches are lined up, but if the rules are only one drink per mustache fitting, we might need to consider a slightly different approach.

To take algorithm-based methods in this new direction, one would simply have to keep track of the previous mustache location and make sure that each subsequent coordinate pair was outside of a predetermined distance threshold before adding the coordinates to the list, removing repetitive movements from our dataset.

I admittedly haven’t applied this logic, but I hypothesize that it would not make much of a difference in the final outcome.

I welcome any improvements to these methods as well as suggestions on other fun applications of the data science toolkit you would like to see — leave a comment below!.. More details

Leave a Reply