How to Pytorch in Production: Part 2. Architecture

We had different options, starting from Redis, RabbitMQ, HTTP/gRPC and ended up with Kafka.

The main 2 advantages are: easy to scale with partitions and message buffering.

The main disadvantage is rebalancing.

If you used it before — you know it, otherwise please test Redis or RabbitMQ before you implement your own message bus.

MonitoringAs we had an Influx server available it was a pretty reasonable choice to send some data points and monitor them using Grafana.

Here is how a live system looks like atm:If we had to choose we would probably go either with Sentry or ELK stack using APM on top of it.

Save some $AWS costs a lot or more expensive than alternatives.

We managed to save 10x $ by ordering a few GeForce GTX-1080 with 8Gb GPU on Hetzner.

Works really well for Europe.

Also, I will not say it is super reliable however, if you put 2 clusters and a load balancer between them — everything works like a charm.

Anyway, even 5 clusters would be 2x cheaper and 5 times more effective than AWS, sorry Jeff.

Being able to scale on requestIf you ever tried helm in K8s or infrastructure as a code, you should know that scaling here is just a matter of a single line change.

It’s really easy to scale a pod once you are in K8s.

Another question is how hard to get there, but we were lucky enough to get some expertise on a side.

Having funGo is fun, is a staticly typed, compiled language which was declared as a framework itself.

Goroutines are more than famous, if you have any issues in production or life, just add a goroutine and it will get fixed.

While we definitely had fun and go is indeed a well-designed language it still caused us some pain.

If you come from a 25 years old language which has tons and dozens of libraries it might seem that Go does not have many and it’s true.

Need a database?.Write plain SQL.

Need an API server?.Write your own middleware and request parser.

Need documentation?.Luckily it’s there.

And of course not fully complete if we talk about swagger.

On one hand, it saved us some time and pain produced by duck-typing, on another it is still not that mature and you need to give credits back to the community even for 10 years aged language.

Not so bad and not as sweet as people tell you.

Take it with a grain of salt.

Anyway, it worked well and we are sticking with it for small services.

PersistenceThe first milestone was Redis, because it is easy.

You put json into memory and it lives well.


you are out of RAM.

Just drop more RAM on it.


However, if you have processing RAM ends pretty fast.

For us it was around 1Gb for 15 minutes of work.

Let’s increase it to 1 hour and you get 4Gb.

RAM is definitely a cheap resource however, do you know what is cheaper?.Disk space and in 2k19 SSD is cheap as hell and as fast as a rabbit.

So we decided to go with a relational database, denormalize data and store it in PostgreSQL.

Works pretty well and we are not going to move from it unless we are tired from data scheme maintenance and want to use NoSQL.

If we can do it the right way.

We tried to use MongoDB as a NoSQL database to store historical data because you know what, every BigData startup should use MongoDB — in the end, we spent more time optimizing it and writing code to deal with unstructured data.

Will give it one more shot next time.

ExtendableLet’s get to the picture again.

See containers on the right?.Every ML task lives in its own container and does only 1 thing.

Either this is an index of Euclidean distances or classification task or regression — we put it in the separate box.

This allows us to scale bottleneck places at any time without a need to tweak the rest of the system.

If classification is slow — add more classificators, too many API requests — add one more API server.

Each box is responsible for only one thing and it does it really well.

If we decide to migrate from PyTorch to Keras or to TensorFlow — we can migrate a single container and see how it goes.

If it works out — roll it out on the whole system.

Does it sound like a plan?SDK-friendlyAs an engineer, I really hate systems that work but either do not or have very poor documentation.

All you can do is to play with it, make a few guesses and who knows what is waiting for you when you go live.

I integrated probably every 3d-party system that was launched before 2017 and amount of poorly documented services is astonishing.

We want our users to have a pleasant experience and we work hard on it.

A good starting point was swagger documentation which we intend to use you autogenerate SDK-clients.

This is basically what Amazon does and everything works well.

However, every AWS client looks like you are writing Java no matter what language you use.

So the next milestone after we deliver well-defined API would be native SDK-clients and .


examples on how to use them.

Really, so many APIs lack at least examples, I am not even saying about documentation which might be misleading.

In the end, we are half-way on releasing new architecture and experimenting much more on what can be added as a feature.

After we make sure it is feature-rich we are going to make it public, make it easy for developers and keep experimenting with scaling and performance tuning as it is fun.

Let me know if you want to know where we decided to go in API to make it extendable with new features.

Originally published at tarasmatsyk.

com on March 26, 2019.


. More details

Leave a Reply