In code we trust

I’ll focus mostly on Open Source tools: Although there are very good commercial and proprietary solutions around[2], I prefer to present options which are available to everyone for free (I’m from a town in Northern Italy where people are famous for their stinginess[3] 😉 ).Back to our environment and what sort of stuff we need to put inside it: At a very basic level — things can actually get more complicated here — we need instruments to: (a) manage the source code, providing code templates, handle source code versions and so on, (b) manage the build process to convert source code to the binary (executable) artefacts, with dependencies handing, versioning and storage of the produced components etc..It provides a means for monitoring those changes as they occur, it coordinates contributions by different people and tracks the ownership of changes made to the code, it provides a backup of the work done and allows to restore a working version in case some change introduces bugs or malfunctionings..In general, some form of versioning should be applied not only to application code but to binary artefacts, ML models, documentation, configuration and scripts.In short, version everything.When talking about control and versioning for source code, Git[4] is the first thing that comes to mind..Just putting bare repositories for your Git projects on some shared folders — to be accessed via shell commands — is an option, but if you want some nice UI tool, one you can try is GitLab[5]; I’ve used it in the past and it does its job fine.The good news here is that I didn’t find any specific reasons to change version control tool or workflow for the specific case of machine learning -based software projects..Code is code, and Git works pretty well whether your code implements a pretty user interface in Angular, a Java REST service wrapping a JDBC database query, or Python code defining a Deep Neural Network architecture in Tensorflow.Now, what about the next step, managing the build process to convert your source code to binary (executable) artefacts?.Your whole build process can then be organized, archived, versioned and so on … version everything, remember?Build and dependency management tools compile your source code into a binary artefact keeping track of package dependencies and managing them automatically and uniformly..Keep into account that in any complete real-world application the functionality providing Machine Learning capabilities is likely to be just one of many interacting parts.In such a jumbled setting, rather than trying to make a single instrument like Maven fit all technologies, my approach reverted to using different tools for different projects: Maven for java projects, setuptools for python, Angular cli for Angular projects and so on.. More details

Leave a Reply