Hot Dog or Not? A lesson on image classification using convolutional neural networks

That’s 43,200 numbers that it’s taking in.

The hidden layer identifies edges and color clusters and much more.

For example, if it sees a long cluster of bready colors between a long cluster of meaty colors, the output layer might decide that it’s a hot dog.

The hot dog example is binary, meaning there are only two outputs.

It’s either a hot dog or it’s not.

This is what really disappointed the rest of the Silicon Valley crew, because they wanted the app to be able to detect *any* type of food.

This would require a multi-class neural network, where the output layer would be millions of different types of food.

Here is an example of a multi-class neural network, where the input layer takes in all the different features for all of these different images of faces and then categorizes them by who it thinks the image is of.

The output class could be individual people, but it could also classify, for example, age ranges of people, ethnicities of people, hair color of people, etc.

Of course, this means that you have to train your model on all of these people.

Or in our case, we trained over 1,000 photos of hot dogs and over 3,000 photos of things that weren’t hot dogs.

Yeah, right now my computer has over 4,000 images of food on it.

Convolutional Neural NetworksIn mathy terms, a convolution is measuring how much two functions overlap as one passes over the other.

So in terms of image processing, the convolutional layer of the neural network passes a filter (or window) over a single image and searches for specific features.

This process essentially turns this photo:image creditInto another image, broken down into equally-sized image tiles:image creditAnd then we can feed each tiny image tile into the network again.

This step essentially reduces the dimensionality of the original image.

To reduce it again, we introduce a method called “Max Pooling,” that essentially takes the entire array and only keeps the most important features (aka the biggest number).

So, we started with one giant image and kept breaking it down to result in a small(er) array that we can finally put back into our fully-connected neural network.

That was a ton of information!.For a more in-depth explanation of Convolutional Neural Networks, Max Pooling and Downsampling, I highly recommend this Medium post (where I got the images of the kid).

It was so helpful in our research process!ResultsAfter messing around with multiple convolutional layers, drop-out regularization, testing out different hyperparameters and waiting around for many, many epochs, we reached accuracy scores around 70% to 74%.

This was just *okay*, considering we had a (purposeful!) class imbalance.

Example of how we tested epochs.

You can see where we start to overfit the data!Transfer LearningIn the end, we got the best scores applying a method called Transfer Learning.

This is the re-use of a pre-trained model, and we got ours through Keras.

In this case, these transfer learning models are pre-trained on millions of images for thousands of classes.

Vishal tested several pre-trained models (VGG16, InceptionV3, and MobileNetV2) and then tuned them to be adapted to our binary hot dog classification.

With Transfer Learning models, our accuracy scores went up higher than 90%, and almost every image we tested it on was correct.

Testing, Testing!This guessed with 100% confidence that this image is a hot dog.

And it is!Am I a hot dog?There is a .

000006% chance that me and my dad on the beach are a hot dogIs this sub sandwich a hot dog?The model thinks there is a 28% chance this sub sandwich is a hot dogWe tried over and over to “break” our model.

During our talk, we asked took live image suggestions and someone so smartly asked us to test an eclair!One might suggest that this is a *sweet* hot dogOur model guessed with nearly 100% confidence that this eclair was a hot dog!.It is a very hot dog-looking eclair.

But still not a hot dog.

This process was such an amazing learning process and experience for us.

The goal of this post was to share a bit of the knowledge we picked up along the way.

Next time Facebook auto-tags your photos, or your iPhone groups your photos by person or event, I hope you have a better idea about how it’s done!.

. More details

Leave a Reply