How will automation tools change data science?

In addition, the outcome of most enterprise data science projects are hard to interpret, making it difficult for business users to implement the results.Traditional Data Science ProcessPlaying with machine learning (ML) models is considered to be the fun part, but the real pain point of any data science project is often last-mile ETL and feature engineering..As illustrated below, machine learning requires a single flat table called a feature table..Given a feature table, data scientists can play with ML algorithms..But actual enterprise data is never a single flat table..Instead, it’s a collection of many data tables with complex relationships.Data required by machine learning (left) vs..Actual enterprise source data (right)Last-mile ETL and feature engineering are necessary steps to transform the collection of tables into a feature table..These are the most challenging and time-consuming steps in a data science project and require highly skilled data scientists and domain experts – expensive and scarce resources.“… Feature engineering is typically where most of the effort in a machine learning project goes … and where intuition, creativity and “black art” are as important as the technical stuff…”..- Dr..Pedro Domingos**Trials to automate machine learning have started in early 2010’s (e.g. AutoWEKA in 2013) and has become very trendy.  DataRobot and H2O.ai are leading startups in machine learning automation.The fundamental idea of machine learning automation is to train scoring models using different algorithms (including preprocessing like missing value imputation) with different hyper-parameters, and validate their accuracy to select the best model..Recently, companies like Microsoft have also started to support machine learning automation tools (more details can be found here or here)..These great tools significantly simplify building machine learning models..On the other hand, last-mile ETL and feature engineering is still a manual process and requires the substantial involvement of domain experts and data scientists.Although there have been efforts to automate feature engineering, most of them focus on non-linear transformations of a given feature table, which is just a small component of the feature engineering process and relies on manual creation of the feature table.. More details

Leave a Reply