The Data Science Workflow

If you used third party data sources during training, think whether you will need to integrate with them in production and how will you encode this access information in the model exported from your pipeline.As soon as you deploy your model to production, it turns from an artefact of data science to actual code, and should therefore be subject to all the requirements of application code..This means testing..Ideally, your deployment pipeline should produce both the model package for deployment as well as everything needed to test this model (e.g. sample data)..It is not uncommon for the model to stop working correctly after being transferred from its birthplace to a production environment..It may be be a bug in the export code, a mismatch in the version of pickle, a wrong input conversion in the REST call..Unless you explicitly test the predictions of the deployed model for correctness, you risk running an invalid model without even knowing it..Everything would look fine, as it will keep predicting some values, just the wrong ones.Model monitoringYour data science project does not end when you deploy the model to production..The heat is still on..Maybe the distribution of inputs in your training set differs from the real life..Maybe this distribution drifts slowly and the model needs to be retrained or recalibrated..Maybe the system does not work as you expected it to..Maybe you are into A-B testing.. More details

Leave a Reply