How to Engineer Your Way Out of Slow Models

After training you get an accuracy of 99.99%, and you’re ready to ship it to production.But then you realize the production constraints won’t allow you to run inference using this beast..In other words, given an image, the image component outputs an embedding.The model is deterministic, so given the same image will result with the same embedding..We can load Inception into Retina, tell the model we intend to train that we want to use Inception embedding, and that’s it.Not only that the inference time was improved, but also the training process..This is possible only when we don’t want to train end to end, since gradients can’t backpropagate through EmbArk.So whenever you use a model in production you should use EmbArk, right?.Well, not always…There are three pretty strict assumptions here.It doesn’t hurt us that the first time we see an image we won’t have its embedding.In our production system it’s ok, since CTR is evaluated multiple times for the same item during a short period of time.. More details

Leave a Reply