By Kristina Young, Senior Data ScientistIn the past few years there has been a large increase in tools trying to solve the challenge of bringing machine learning models to production.
One thing that these tools seem to have in common is the incorporation of notebooks into production pipelines.
This article aims to explain why this drive towards the use of notebooks in production is an anti pattern, giving some suggestions along the way.
Let’s start by defining what these are, for those readers who haven’t been exposed to notebooks, or call them by a different name.
Notebooks are web interfaces that allow a user to create documents containing code, visualisations and text.
They look as follows: Contrary to what you might have gathered from the introduction, Notebooks are not all bad.
They can be quite useful in certain scenarios, which will be described in the sub-sections below.
Data analysisThis is probably their most common use.
When greeted by a new dataset, one needs to dig into the data and do certain visualisations in order to make sense of it.
Notebooks are good for this because they allow us to:ExperimentationWhen it comes to machine learning, a lot of experimentation takes place before a final approach to solve a problem is chosen.
Notebooks are good to play around with the data and various models in order to gain an understanding of what works with the give data and what does not.
One time tasksNotebooks are also a good playground.
Sometimes one needs to perform an automated task once whilst perhaps not being familiar or comfortable with writing bash or using other similar tools.
Teaching or technical presentationsWhen teaching python, or performing a technical presentation for your peers, you may want to show code and the result of that code immediately after.
Notebooks are great for this as they allow you to run code and show the result within the same document.
They can show visualisations, represent sections with titles and provide additional documentation that may be needed by the presenter.
Code assessmentsIf your company provides code challenges to candidates, notebooks can be a useful tool.
This also depends on what your company needs to assess.
Notebooks allow a candidate to combine documentation, explanations and their solution into a single page.
They are also easy to get running for the assessor, provided that the candidate has given the package requirements.
However, what they cannot provide is a wide enough assessment of the candidate’s understanding of software engineering principles, as we will understand better from the next section.
A lot of companies these days are trying to solve the challenge of bringing models to production.
Data scientists within these companies may come from very varying backgrounds, including: statistics, pure mathematics, natural sciences and engineering.
One thing that they do have in common is that they are generally comfortable with using notebooks for analysis and experimentation, as the tool is designed for this purpose.
Because of this, large companies that provide infrastructure have been focusing on bridging the “productionisation gap” by providing “one click deployment” tools within the Notebook ecosystem, therefore encouraging the use of notebooks in production.
Unfortunately, as notebooks were never designed to serve this purpose to begin with, this can lead to unmaintainable production systems.
The thought of notebooks in production always makes me think of the practicality of a onesie comic – looks beautiful but is very impractical for certain scenarios.
Now that we know what Notebooks can do well, let’s dive into what they are really bad at in the following sections.
Continuous integration (CI)Notebooks are not designed to be automatically ran or handled via a CI pipeline, as they were built for exploration.
They tend to involve documentation and visualisations, which would add unnecessary work to any CI pipeline.
Though they can be extracted as a normal python script and then ran on the CI pipeline, in most cases you will want to run the tests for the script, not the script itself (unless you are creating some artefact that needs to be exposed by the pipeline).
TestingNotebooks are not testable, which is one of my main pain points about them.
No testing framework has been created around these because their purpose is to be playgrounds, not production systems.
Contrary to popular belief, testing in data products is just as important and possible as in other software products.
In order to test a notebook, the code from the notebook needs to be extracted to a script, which means that the notebook is redundant anyway.
It would need to be maintained to match the code in the extracted script, or diverge into some more untested chaos.
If you want to learn more about testing ML pipelines, check out the article: Testing your ML pipelines.
Version controlIf you have ever had to put a Notebook on git or any other version control system and open a pull request, you may have noticed that this pull request is completely unreadable.
That is because the notebook needs to keep track of the state of the cells and therefore has a lot of changes taking place behind the curtains when it is ran to create your beautiful HTML view.
These changes also need to be versioned, causing the unreadable view.
Of course you may be in a team that uses pairing and not pull requests, so you may not care about the pull request being unreadable.
However, you lose another advantage of version control through this readability decrease: when reverting code, or looking into old versions for a change that may have introduced or fixed a problem, you need to rely purely on the commit messages and manually revert to check a change.
This is a well known problem of notebooks, but also one that people are working on.
There are some plugins that can be used in order to alleviate this at least in a web view of your version control system.
One example of such a tool is Review Notebook App.
CollaborationCollaboration in a notebook is hard.
Your only viable collaboration option is to pair, or take turns on the notebook like a game of Civilization.
This is why:StateState has already been mentioned in both of the above, but it deserves its own point for emphasis.
Notebooks have a notebook wide state.
This state changes every time that you run a cell, which may lead to the following issues:Engineering standardsNotebooks encourage bad engineering standards.
I want to highlight the word encourage here, as a lot of these are things are avoidable by the user of the notebook.
Anti-patterns that are often seen in notebooks are:Package managementThere is no package management in notebooks.
The notebook uses the packages installed in the environment that it is running in.
One needs to manually keep track of the packages used by that specific notebook, as different notebooks running in the same environment may need different packages.
One suggestion is to always run a new notebook in a fresh virtual environment, keeping track of that specific notebook’s requirements separately.
Alternatively all the notebooks in an environment would rely on a single requirements file.
Great, so now we know why notebooks in production are a bad idea and why we need to stop dressing up experimentation tools as productionization tools.
Where does that leave us though?.That depends on the skills and structure of your team.
Your team most likely consists either of:So let’s take a look at the two scenarios below.
A team of data scientists with engineering skillsIn this scenario, your data science team is in charge of the models end to end.
That is, in charge of experimentation as well as productionization.
These are some things to keep in mind:Separation of engineering and data science skillsSome larger organisations prefer more specialised skillsets, where data scientists work on the experimental work and ML/data engineers bring those to production.
The points listed in the above scenario are still relevant, but I have 1 additional suggestion specific to this scenario:Please, please, please Don’t throw the model over the fence! Sit together, communicate and pair/mob program the pipeline into production.
The model doesn’t work unless it provides value for the end users.
Like with any tool, there are places to use notebooks and places to avoid using them in.
Let’s do one last recap for these.
In conclusion, there are two messages that I would like you to take from this article: Bio: Kristina Young is a Senior Data Scientist at BCG Digital Ventures.
She has previously worked at SoundCloud as a Backend and Data Engineer in the Recommendations team.
Her previous experience is as a consultant and researcher.
She has worked as a back end, web and mobile developer in the past on a variety of technologies.
Original.
Reposted with permission.
Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.
createElement(script); dsq.
type = text/javascript; dsq.
async = true; dsq.
src = https://kdnuggets.
disqus.
com/embed.
js; (document.
getElementsByTagName(head)[0] || document.
getElementsByTagName(body)[0]).
appendChild(dsq); })();.