Introducing Skelebot — A Better Way to Manage Machine Learning Projects

Introducing Skelebot — A Better Way to Manage Machine Learning ProjectsSean ShookmanBlockedUnblockFollowFollowingApr 24Photo by Steve Johnson on UnsplashSkelebot is command line tool for executing and managing machine learning projects in R and Python.

A Problem of ConsistencyAs with most software, Skelebot was created to solve a problem.

In this case, the problem was a lack of consistency.

More specifically, it was a lack of project structure consistency across our machine learning repositories.

This lack of consistency meant that diving into another Data Scientist’s project was a tedious endeavor of scouring code and trying to make sense of it all, or just pestering the person who wrote the code into helping you understand it.

It also meant a lack of consistency in the deployment process due to the fact that no two projects were quite the same.

This either resulted in the tailoring a model’s deployment process to fit the project, tweaking the project to fit a more generic deployment structure, or sometimes even a bit of both.

A Simple Scaffolding ScriptWhat we needed was a unified, or at least similar, structure to the folders and files in our repos so that it would be easier to understand one another’s repository, and also easier to deploy and execute them in a consistent, reproducible manner.

As a result I had the idea to create a simple python script that would prompt for a few values and then generate a consistent folder structure for the project.

The scaffolding script was built rather quickly, but the coding didn’t stop there.

Example of Project Scaffolding through SkelebotIn writing up the script I realized I could tweak it here and there to make it more useful, such as automatically generating the initial Dockerfile for my project.

Then I decided to just allow Skelebot to build the image as well.

Every time I realized a new need for my projects, I would find a way to incorporate it into Skelebot to make my life a little easier.

Before long the simple scaffolding script had taken on a life of it’s own and had become something else entirely.

Unified Project InterfaceWhile Skelebot is now much more than just a scaffolding tool it does still retain the scaffolding functionality.

Once the project has been scaffolded, it offers even more functionality for the project such as the ability to specify a project’s jobs through the config file, and allow them to be discovered and executed through the Skelebot CLI.

Skelebot has evolved from a simple scaffolding script into command line tool for executing and managing machine learning projects.

Help Interface of an Example Skelebot ProjectInstead of creating a folder structure to serve as a unified structure for our projects, the Skelebot CLI now serves as the unification mechanism itself.

Regardless of folder structure, any Skelebot project can be quickly understood with a single command, thereby providing a simple unified interface for anyone trying to execute or understand the project.

All commands for the project now go through Skelebot which can provide the details for executing those commands with ease.

Help Interface of an Example Skelebot JobInstallIn order to use Skelebot there are two dependencies that you must have installed on your system:Python 3.

6 (or later)DockerOnce these are setup you can download or clone the Skelebot source code.

Then, navigate to the root of the project and execute the install.

sh script.

/install.

shScaffoldIf you are setting up a new Skelebot project you can do this by creating a new directory, navigating to the new directory, executing the scaffold command, and following the prompts.

skelebot scaffoldOnce this is done, your new project will be ready and waiting for you.

If you would like to add Skelebot functionality to an existing project this can be achieved just as easily by executing the scaffolding command with the existing flag to indicate an existing project.

skelebot scaffold –existingNow that you have a Skelebot project, you can execute the help command to see what your new Skelebot project can do.

skelebot –helpDependenciesThe first thing you will probably want to do is get your dependencies setup.

Inside the skelebot.

yaml file that is generated during scaffolding you will see a list of dependencies with a few defaults already populating the list.

You can edit this list to include whatever language dependencies to need, and you can also append ‘={version}’ to specify a version number for the dependency.

dependencies:- pyyaml- artifactory- coverage- pytestEach entry in the list of dependencies will add a new layer to the Docker image.

Therefore, to keep the build process as efficient as possible, it’s best to add new dependencies to the bottom of the list and allow everything above the new dependency to load from cache as often as possible.

Project ExecutionThe next thing you will probably want to setup are the jobs in your project.

Jobs define the details around the execution of a script (R, Python, or Bash) through Skelebot, so they can be whatever you want them to be and do whatever you want them to do.

Some examples would be a query job to go out and pull down the data for the project or a train job to train the model and save it as file.

Whatever the job, the only things that are really needed are a name (to tell Skelebot what command will execute the job), a source script to be executed, and a help message for others to understand the intention behind the job.

jobs:- name: query source: src/jobs/query.

R help: Query for data from the db and place it in the data folder mode: i mapped: – data/ ignore: – ‘data/*.

RData’ – ‘models/*.

RData’ params: – name: startDate alt: s – name: endDate alt: e- name: train source: src/jobs/train.

R help: Train the model and place it in the models folder mode: i mapped: – models/ ignore: – ‘models/*.

RData’ args: – name: nroundsThe path to the script as well as any other paths given are relative to the root of the project itself and must be contained within the project folder in order to be copied into the image or volume mapped correctly.

There are other optional fields that can be used to further refine the way in which the job is executed.

The mode field can be used with either ‘i’ for interactive mode (the default) or ‘d’ for detached mode in DockerThe ignore list is used to prevent files and/or folders from entering the image during the build processThe args list specifies required values that directly follow the command itselfThe params list specifies optional values that follow the command and are set with their given name or alt nameThe mapped list provides a way to volume map directories into the container at runtime in order to persist output files on the local file-systemThese jobs can then be executed through Skelebot which will build the Docker image and run the script as the entry point while starting the container.

skelebot query –startDate 2019–01–01 -e 2019–02–01This would execute the query script (src/jobs/query.

R) with the params startDate and endDate of 2019–01–01 and 2019–02–01 respectively inside the Docker container.

It is important to note when passing the params down to the script that Skelebot will use the full name of the param, even when the alt name is provided in the command.

For this reason your underlying script must at the very least understand how to parse the parameter with its full name.

Project ManagementJobs are at the heart of Skelebot as they provide the key mechanism for interacting with a project, but Skelebot provides additional features for managing machine learning projects as well.

Push and pull artifacts to and from ArtifactorySpin up Jupyter inside a Docker container that contains your projectSetup Kerberos HDFS auth inside of your project’s Docker imageSpecify different configurations for each environmentCreate and install plugins to extend Skelebot’s functionality even furtherFor more detailed information on the various things that Skelebot can do, please refer to the documentation.

ContributeSkelebot is open-source and anyone is welcome to make contributions to the project.

Right now Skelebot is under active development, and has thus far primarily been informed by our own particular requirements.

We hope that others will contribute to Skelebot based on their own project needs and help to build the tool into something even more universally applicable.

If you would like to make a contribution, please read our Contributor Code of Conduct.

LinksGitHub: https://github.

com/carsdotcom/skelebotDocs: https://carsdotcom.

github.

io/skelebot/.. More details

Leave a Reply