Stop training more models, start deploying them

By Maurits Kaptein, Tilburg University The rumours that AI (and ML) will revolutionise healthcare have been around for a while [1].

And yes, we have seen some amazing uses of AI in healthcare [see, e.


, 2,3].

But, in my personal experience, the majority of the models trained in healthcare never make it to practice.

Let’s see why (or, scroll down and see how we solve it).

  Note: The statement ”the majority of the models trained in … never make it to practice” is probably true across disciplines.

Healthcare happens to be the one I am sure about.

   Over the last decade, I have developed AI methods allowing computers to learn “what works for whom”.

I have worked on Bandit problems [e.


, 4] with applications in online marketing [5], thus learning “which product to show to whom”.

More recently I became interested in “which treatment works for which patient?”; partly using the same data science methods, but this time with a positive impact.

For example, together with Lingjie Shen, I spend the last two years collaborating with the Dutch Cancer Registry (NCR).

The great folks at NCR maintain detailed records containing the progression of cancer cases in the Netherlands.

The registry contains the background characteristics of patients, their treatment (e.


, chemo-therapy yes or no) and the outcome (2 or 5 year mortality).

It takes a lot of curation to get this data from the various hospitals, but the NCR, in the end, provides a clean and well-documented collection of cases.

We worked with an excerpt from the registry containing over 50.

000 patients suffering from colon cancer and whose tumor was surgically removed.

After removal of the tumor there is debate on whether or not to administer chemo-therapy; while most randomized controlled trials (RCTs) show a positive effect, this seems to vary widely between patients and tumor types [6].

We set out to study whether we could learn which patients should receive adjuvant chemotherapy.

A simple approach would be to fit a flexible supervised learning model to predict the 5-year survival rates.

So we did, using Bayesian Additive Regression Trees [7].

However, clearly, this not suffice: the NCR data describes real-live cases as they are treated in the hospital: the treatment assignment could well be severely confounded.

We might be mixing up causes for effects.

To counter this, we examined various methods of controlling for this confounding.

We ended up comparing our estimates to estimates from a large RCT.

This allowed us to validate our “correction model”.

In the end, these steps allowed for “imputing the counterfactuals” [8].

We created a dataset containing the outcomes that we expected under treatment and under no-treatment for each and every patient.

This allowed for generating personalized treatment rules [9]: we could finally say which treatment works for whom.

The three paragraphs above took over a year and required coordination between our team, the NCR, and a number of participating oncologists.

Eventually, the project gave us an AI (or what-shall-we-call-it) model that, given a feature vector, determines the optimal treatment choice.

That awesome result, however, brings up the logical next question; how do we make sure this model is actually used by healthcare professionals?   To find out, I decided to talk to those who had experienced the problem before.

So, I about a year ago, I started conversations with the great people at NCR: “How do you deploy the models that you, and the researchers that use your data, train?” Cutting a long answer about public APIs, regulation, privacy, and cultural and organizational hurdles short, they hardly do.

It proved to be extremely hard to deploy models.

And, it was not just NCR; I talked to Data Scientists at various health insurers.

Same problem.

They train models and validate them, but it’s hard to deploy them into healthcare practice.

I talked to Dutch scientific funding sources: yes, they fund the training of models for various healthcare applications routinely.

But, actually deploying these models is challenging, and most projects don’t make it past “the notebook stage”: a nice demonstration of the validity and usefulness of the model, but no impact in practice.

The bottom line here is that it was not just me, a researcher, who was failing to move trained models to production.

Its omnipresent and it is hurting our ability to truly improve the lives of patients.

   So, we have a problem on our hands: how do we move models to production? Now, the answer to this question is going to be complicated; it will take more than just a technological solution.

But, I happen to find the technological problem(s) interesting, so let’s focus on those for now.

In my view, a major hurdle in deploying models is caused by the vastly different requirements that we have for model training and model deployment.

Let’s look at a few:Hence, while most data scientist understandably happily chop away using their jupyter notebooks (or various other tools), whatever happens on their local machine is all too often ill-suited to run in the hospital.

Effectively, we need a method to bridge the gap from training requirements to deployment requirements.

   Obviously I am not the only one identifying this problem; in recent years a number of potential solutions have been suggested.

Most of the solutions fall into one of the following three classes, each of which has its drawbacks.

Effectively, it would be great if we can somehow hit the sweet spot between approaches 1 and 2 above: can we allow data scientists too easily convert a fitted model into a small and efficient application that can be run anywhere?   Automatically rebuilding a model, in such a way that it can be run in extremely small and efficient containers might sound impossible, but come to think of it, it isn’t.

We solved the issue using WebAssembly, the runtimes provided by our friends at wasmer, and, admittedly, quite a lot of fiddling around with the c code underlying (e.


,) sklearn and various compiler directives.

All of this resulted in a fully automatic way of compiling (or transpiling really) a stored model object to WebAssembly using a single-line-of-code.

The result is a blazingly fast, super small “executable” that can be run on pretty much any environment; we can run the task on a server and disclose it using a simple API, but we can even run it in a browser or on the edge.

The WebAssembly (or .

wasm) executables can be shipped to the hospital, and ran within their systems, directly on the EPD data, without the data ever leaving the hospital.

   To illustrate, let’s look at some code for a simple linear regression model.

The following code fits a simple regression model using sklearn, and subsequently uploads it to scalable using the sclblpy package:Within seconds, the model is transpiled.

To make things super easy, we even host the resulting .

wasm executable on our servers.

This particular model upload is available under id 7d4f8549-a637–11ea-88a1–9600004e79cc, and can be tested at here.

Or, alternatively, a simple cURL call like this: immediately provides the resulting model inference.

   We think that our approach to putting models into production can bridge the gap from training to deployment in healthcare and beyond.

It is easy, and the result is performant and portable.

We however do need people to try it, give feedback, and help us improve.

So, if you want to give it a try, get your own account at www.



Any comments are highly appreciated!   Let’s wrap this up.

I think we are developing useful models faster than we are using them.

I think model deployment is an issue in healthcare (and beyond).

I think it is, in part, caused by the vastly different requirements that we have when training models versus deploying them.

I think automatically transpiling models to WebAssembly solves this: it is easy, it is highly performant, and the resulting executable is portable.

We are looking forward to seeing your feedback.

     Its good to note my own involvement here: I am a professor of Data Science at the Jheronimus Academy of Data Science and one of the cofounders of Scailable.

Thus, no doubt, I have a vested interest in Scailable; I have an interest in making it grow such that we can finally bring AI to production and deliver on its promises.

The opinions expressed here are my own.

  Bio: Prof.


Maurits Kaptein is a professor of data science at Tilburg University, The Netherlands, and one of the co-founders of Scailable.

Maurits has worked on various topics, from multi-armed bandits to Bayesian Additive Regression Tree (BART) models to efficient methods for adaptive clinical trials.

After all that’s done, it’s time to go surfing.


Reposted with permission.

Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.



js; (document.

getElementsByTagName(head)[0] || document.


appendChild(dsq); })();.

Leave a Reply