4 Reasons Why Your Machine Learning Code is Probably Bad

By Norman Niemer, Chief Data ScientistYour current workflow probably chains several functions together like in the example below.

While quick, it likely has many problems:   Instead of linearly chaining functions, data science code is better written as a set of tasks with dependencies between them.

That is your data science workflow should be a DAG.

So instead of writing a function that does:You are better of writing tasks that you can chain together as a DAG:The benefits of doings this are:   Below is a stylized example of a machine learning flow which is expressed as a DAG.

In the end you just need to run TaskTrain() and it will automatically know which dependencies to run.

For a full example see https://github.

com/d6t/d6tflow/blob/master/docs/example-ml.

md   Writing machine learning code as a linear series of functions likely creates many workflow problems.

Because of the complex dependencies between different ML tasks it is better to write them as a DAG.

 https://github.

com/d6t/d6tflow makes this very easy.

Alternatively you can use luigi and airflow but they are more optimized for ETL than data science.

  Bio: Norman Niemer is the Chief Data Scientist at a large asset manager where he delivers data-driven investment insights.

He holds a MS Financial Engineering from Columbia University and a BS in Banking and Finance from Cass Business School (London).

Original.

Reposted with permission.

Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.

disqus.

com/embed.

js; (document.

getElementsByTagName(head)[0] || document.

getElementsByTagName(body)[0]).

appendChild(dsq); })();.

. More details

Leave a Reply