Bringing Machine Learning Research to Product Commercialization

Having seen both academic research as well as industry research in the context of deep learning I noticed that there are quite some differences in daily life and methods being applied.For this reason, in this blog post I want to share some of the insights into the differences between academia and industry when applying deep learning to real-world problems as we experienced them at Merantix over the last two years..Amongst other things, I will go into detail about differences regarding workflow, general expectations as well as performance, model design and data requirements.Since we started Merantix in 2016, we have incubated multiple growing artificial intelligence ventures in highly interesting, yet challenging industries..During this time, we had the opportunity to learn our own very important lessons and hence, now feel capable of sharing some of our most relevant insights with you.A little disclaimer:   When trying to bring deep learning from research into application one can broadly distinguish between commercial and technical challenges..You start from a fixed performance requirement, let’s say aiming for a 90 – 95% detection of cancers in mammography screenings (there is a trade-off between sensitivity and specificity) or causing only a single severe accident or disengagement every 1 billion miles. Only then you start to think about deploying a specific model and what kind of training data would be required to sufficiently train that model for the performance requirements. Actually, there is a lot of flexibility with regard to model and data and neither of them have to be state-of-the-art as long as they fulfill the requirements of the use case. However, there might be other constraints such as explainability or fast interference as I will explain in detail below.All in all, it is crucial to distinguish between academia and industry and especially to keep in mind that the workflows are contrary to each other as illustrated in the figure above. This, in turn, has a lot of implications for how to successfully bring research into application. Hence, I will address some of the insights we gathered at Merantix over the last two years in the three chapters covering 1) performance, 2) model and 3) data.   Hit the binary success criteriaIn machine learning driven product development and commercialization it is important to understand that there are rather “binary” success criteria compared to the statistical metrics of academia, which define continuous success in comparison to the current state of the art in research..While in academia a performance of 70% in regard to a specific machine learning task might be a remarkable success (as long as it is better compared to everyone else), commercial applications require the highest degree of functionality and reliability..This means, there will not be any company selling autonomous vehicles that crash occasionally and no radiologist will ever buy a software that does not detect cancers from time to time – this holds true even if the algorithm outperforms humans on average.For that reason, setting the right scope and thereby limiting your performance to a certain environment or use case is one of the most crucial steps in machine learning product development..However, even for the cases where the performance is not good enough yet, there are two options of adding commercial value:Predict uncertaintyA deep learning model deployed in a practical application will always return a prediction no matter the input..This can be understood when looking at a distribution of data points and not the mean. Gal & Ghahramani (2015).Know your target environmentIn order to successfully deploy a machine learning system one has to ensure that it will not only work on training sets but also in the real world..In machine learning in order to measure the performance of a model you typically first train it on a distinct training dataset and later use a separate test dataset for evaluation..For the latter, it is crucial that the data is as similar to the real world as possible, hoping the model will also work in practice after succeeding on the test set..To elaborate on this issue, let’s consider a few industry examples per category:In conclusion, the examples mentioned above imply that it is crucial to continue building new and more accurate test sets due to the changes in data and target environment even if a software is already deployed and operational.Don’t overfit on your test setThe last insight on performance is once more related to the use of test sets..When collecting a new set of unseen images of CIFAR-10 classifiers to test the model on, which is very similar to the original one regarding its data distribution, there is a large drop in accuracy of 4 – 10% for various previously top performing deep learning models..Image: Accuracy drops for newly collected test sets.Coming back to bringing machine learning research into application the results above imply the following: Even in case of a very good test set that adequately represents the real world in accordance to the previous chapter, one always has to take into consideration the possibility of indirectly overfitting to the specific test set after having previously evaluated model performance on the exact same test data – possibly even many times.. More details

Leave a Reply