The Journey from Deep Learning Experimentation to Production-Ready Model Building ????

2012 ImageNet victory, people have thought that data, processing power, and data scientists were the three key ingredients to building AI solutions..The companies with the largest datasets, the most GPUs to train neural networks on, and the smartest data scientists were going to dominate forever.This, however, was only part of the truth.While it’s true that companies with the most data are able to build better predictive models, the improvements to model quality are not linearly proportional regarding dataset size..There was a correlation between better AI solutions and data, but not necessarily causality.Equally, while it’s true that the need for processing power rises in proportion to the amount of data a model is trained on, virtually every company today has access to practically unlimited processing power..Just like airlines optimize the time planes are in the air, effective data science is about optimizing how cloud GPUs are used.While there is a shortage of data scientists, shown by their rising salaries (30% higher salaries than their Software Engineer counterparts) the need for algorithmic development is not as imperative as cobbling together models built on pre-researched best practices..We’ve moved along from research to engineering when it comes to AI, and that requires a different set of skills.Through this transformation in data, processing power and competence, deep learning has in the past five years matured from the question “how can it be applied?” to the more down-to-earth question “how can we scale to production quickly?”..Building production scale solutions quickly require a new set of tools than those required by research or exploration.Let’s have a look at what that means in practice.AI Tools and Frameworks to the rescue! ????The combination of AI hype and the need for more skilled people has drawn in people from different fields into data science..Having to do version control manually is not an option as that would not be your primary focus during model development, leading to random snapshots instead of full reproducibility.But unlike with software engineering, reproducibility should not be limited only to your training code but must also include your training & test data, external hyperparameters, software library versions and more.In this case, automatic version control for every part of each and every training run is the optimal solution.Standardized Pipeline Management????Having the whole team work the same way, with some given degrees of freedom is a necessity..But the central part is that it’s standardized within the company and within the team.The Rebirth of Deep Learning: AI Platforms????To bring clarity and structure to machine learning, the technology unicorns have been building their own overarching platforms that tie solutions to all of the above challenges together, usually in the form of an AI platform that includes libraries, machine orchestration, version control, pipeline management, and deployment.FBLearner Flow is Facebook’s unified platform for orchestrating machines and workflows across the company..It automatically snapshots every training run so that you can always grab a model running in production (on Valohai’s scalable Kubernetes cluster), click a button and trace back to how it was trained, by whom, with which training data, which version of the code and more.Valohai is however not your only alternative — you could build much of it yourself.. More details

Leave a Reply