How To Write Your First Pipeline in Airflow

Can your manager or co-worker reliably reproduce an experiment you ran yesterday?In addition to processes, the tools you use to do machine learning matter as well.

At Comet, our mission is to help companies extract business value from machine learning by providing a tool that does this for you.

Most of the data science teams we speak to are stuck using a combination of git, emails, and (believe it or not) spreadsheets to record all of the artifacts around each experiment.

 Consider a modeling task where you’re keeping track of 20 hyperparameters, 10 metrics, dozens of architectures and feature engineering techniques, all while iterating quickly and running dozens of models a day.

It can become incredibly tedious to manually track all of these artifacts.

Building a good ML model can oftentimes resemble tuning a radio with 50 knobs.

If you don’t keep track of all of the configurations you’ve tried, the combinatorial complexity of finding the signal in your modeling space can become cumbersome.

 We’ve built Comet based on these needs (and what we wanted when we were working on data science and machine learning ourselves, at Google, IBM, and as part of research groups at Columbia University and Yale University).

Every time you train a model, there should be something to capture all of the artifacts of your experiment and save them in some central ledger where you can look up, compare, and filter through all of your (or your team’s) work.

Comet was built to provide this function to practitioners of machine learning.

Measuring workflow efficiency is a notoriously difficult thing to do, but on average our users report 20-30% time savings by using Comet (note: Comet is free for individuals and researchers – you can sign-up here).

This doesn’t take into account unique insights and knowledge that arise from having insights to things like a visual understanding of your hyperparameter space, real-time metric tracking, team-wide collaboration and experiment comparison.

Access to this knowledge enables time savings as well as, and perhaps more importantly, the ability to build better models.

   It is tempting to ignore questions about ML tools and processes altogether.

In a field responsible for self-driving cars, voice assistants, facial recognition, and many more groundbreaking technologies, one may be forgiven for leaping into the fray of building these tools themselves and not considering how best to build them.

If you are convinced that the software engineering stack works well enough for doing AI, you will not be proven definitively right or wrong.

After all, this is a field defined by uncertainty.

But perhaps it is best to consider this in the way a data scientist may consider a modeling task: what is the probability distribution of possible futures?.What is more or less likely? That a field as powerful and promising as AI will continue to rely on the tools and processes built for a different discipline, or that new ones will emerge to empower practitioners to the fullest? If you are curious about these ML tools or have any questions, feel free to reach out to me at niko@comet.


 Additional ReadingBlogs on the differences between Machine Learning and Software Engineering:.

. More details

Leave a Reply