How to deploy big models

But it was still horribly slow (30 seconds just to load static pages) on the server, but fast locally.

I found the main difference was Gunicorn vs Flask development server.

For some reason Gunicorn failed to respond to health checks.

I tried manually implementing these checks in Flask, but to no avail.

Then I tried deploying with the Flask dev server, which I knew to be a no-no since it doesn’t provide things like request queues.

Each request was fast, but the flask dev server isn’t built to scale.

I my the Docker container locally with Gunicorn and found it just as slow as on AppEngine.

I tried a bunch of magic config options including verbose logging, but nothing helped or revealed the problem.

After evaluating many alternatives, I finally settled on Waitress.

It was pretty poorly documented, but I eventually found the magic invocation.

It worked!Deployment still takes an eternity and a half for some reason (~15–30 minutes), so to speed things up, I do my docker build locally and then docker push the already-built image.

Adding GPU supportAppEngine does not support GPUs, so I went with Google Kubernetes Engine (GKE).

It turns out that it’s not enough to add GPUs to your nodes.

You also need to install the drivers.

Creating the cluster and setting up the service should be something like this:gcloud config set compute/zone us-west1-bcloud container clusters create duet-gpu –num-nodes=2 –accelerator type=nvidia-tesla-k80,count=1 –image-type=UBUNTUgcloud container clusters get-credentials duet-gpukubectl apply -f https://raw.

githubusercontent.

com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/ubuntu/daemonset-preloaded.

yamlkubectl apply -f cloud.

yamlThen find your public IP using kubectl get ingress.

It will take a minute to provision.

Managed services like Heroku and AppEngine try to isolate developers from DevOps, but the promise does not appear to extend to large models, and especially not Pytorch models.

In this post I showed how to deploy to GAE flexible environment and GKE using Flask and Waitress.

Now that I have a recipe that works, it will be easy to host more in the future.

The nice part is we still get autoscaling, logging, etc from GAE for free.

For GKE it’s just a config away.

Have you found an easier way?.Leave a note!.. More details

Leave a Reply