Colab synergy with MLflow: how to monitor progress and store models.

Colab Notebook from Google provides a free GPU for up to 12 hours.

However, after 12 hours everything turns into pumpkin: all stored data is gone.

Photo by Dan Gold on UnsplashConvergence to pumpkin is not the only option.

To save trained models one needs to connect Google Storage or Google Drive with Colab Notebook.

To monitor a training progress, an additional tool such as Colab Tensorboard has to be used.

Alternatively, MLflow provides a solution both to store models and to monitor the training progress.

In this blog-post, I present a guide on how to setup MLflow on Google Cloud.

MLflow stores two types of data:structured data: metrics of training progress and model parameters (float numbers and integers)unstructured data: training artifacts (images, models, etc.

)We can either store these types of data in databases or locally on VM.

Let’s start with a databases option.

For the metrics of training, the SQL or Databricks databases can be used.

For training artifacts, the databases are S3, Google Storage or Azure Storage.

To observe the training progress one needs to deploy MLflow server (GUI to manage the stored data) and to connect it to the databases.

The other option would be to deploy MLflow server on a VM and to store everything locally on it.

I decided to proceed with the first option and to connect MLflow server to databases.

This setup looks more robust to me as we do not need to rely on MLflow server, but only on cloud databases.

Also, the databases option allows to deploy MLflow server locally on your laptop which is more secure.

For a cloud provider, I chose Google Cloud, as I still had some free credits left:).

Here is a sketch of an architecture:Let us now follow 5 hands-on steps and deploy the server locally that will be connected to the SQL database for metrics and to the Google storage for artifacts.

1.

Setup service account in IAM control.

My current setup is not the most secure one (see comments on security in the end).

However, one thing that is definitely worth doing is to create a service app in IAM control of Google Cloud.

To create service IAM follow:IAM→Service Account → Select Project → Create service account →…The following permission should be granted to the app Cloud SQL Editor, Storage Object Admin.

The reason to create IAM service account is that if somebody occasionally figures out your app credentials, the maximum damage is limited to SQL and Storage.

For connection to SQL and Storage, we will use service IAM password.

To create JSON with passwords:Service account details → Grant this service account access to project (choose the roles) → Grant users access to this service account → create a key in JSON format and save this file.

Now you have a service app with a JSON key to it.

2.

Create and configure Google SQL server.

Existing solutions in Google cloud which MLflow supports are MySQL and PostgreSQL.

Setting-up PostgreSQL is so far easier.

There are several issues with MySQL in version 1.

0.

0 of MLflow which will certainly be fixed in the next release.

In this post, I describe a setup with PostgreSQL.

2a.

After starting PostgreSQL server, set public IP for your SQL service with rule 0.

0.

0.

0/0 (here you expose your SQL to the Internet and everybody who knows the password of your SQL database can have access to your SQL).

Write down your public IP number as you will need it later.

2b.

Create a table in your SQL where you will store data from MLflow.

2c.

Setup username and password for SQL database.

3.

Configure Google Storage account.

3a.

Create a bucket in Google Storage.

3b.

Add roles: “Legacy Bucket Owner” and “Legacy Bucker Reader” to IAM app-user which you created in step 1.

To do so follow:Choose the bucket → Permissions → Choose service account → Add corresponding roles4.

Start local MLflow Server.

MLflow provides a nice GUI, called MLflow Server.

You can manage your models stored in databases via this GUI.

One option is to start the MLflow server locally on your laptop.

4a.

First, install MLflow and the following necessary packages:pip install google-cloud-storage mlflow psycopg24b.

Remember JSON file from step 1? Now we need it! Type:export GOOGLE_APPLICATION_CREDENTIALS=”path/to/jsonfile”4c.

This is it! Start your MLflow server with a command:mlflow server –backend-store-uri 'postgresql://<username>:<password>@<sql ip>/<name of table>'–default-artifact-root gs://<name of bucket in gs>/Here ‘postgresql://<username>:<password>@<sql ip>/<name of table>’ is an SQL Alchemy string to connect to your PostgreSQL database.

Insert here corresponding credentials from step 2.

After that, your MLflow server should work.

Check it by typing in the browser address line: http://127.

0.

0.

1:5000.

You expect to see a start page of your MLflow server:MLflow Server first start.

5.

Test MLflow setup with Colab.

To test the connection you should first set your credentials.

For example, like that:And then install MLflow with other packages from Colab and save some metrics and artifacts to databases:The full notebook is available in GitHub .

Note that I used the presented setup to train models on a publicly accessible dataset and security was not the main concern.

There are several vulnerabilities, such as exposing SQL server to the Internet as well as writing password from service account directly to Colab notebook.

Limitation of service-account rights is of high importance in this setup.

A more secure way to setup MLflow would be to establish the connection to Google cloud via gcloud tools and to use Google Storage SQL proxy without exposing SQL service to the Internet.

You know, there is always a trade-off between simplicity and security:)Happy model managing and training with Colab!.

. More details

Leave a Reply