4 Reasons Why Your Machine Learning Code is Probably Bad |

By Norman Niemer, Chief Data ScientistYour current workflow probably chains several functions together like in the example below.

While quick, it likely has many problems: Instead of linearly chaining functions, data science code is better written as a set of tasks with dependencies between them.

That is your data science workflow should be a DAG.

So instead of writing a function that does:You are better of writing tasks that you can chain together as a DAG:The benefits of doings this are: Below is a stylized example of a machine learning flow which is expressed as a DAG.

In the end you just need to run TaskTrain() and it will automatically know which dependencies to run.

For a full example see https://github.

com/d6t/d6tflow/blob/master/docs/example-ml.

md Writing machine learning code as a linear series of functions likely creates many workflow problems.

Because of the complex dependencies between different ML tasks it is better to write them as a DAG.

https://github.

com/d6t/d6tflow makes this very easy.

Alternatively you can use luigi and airflow but they are more optimized for ETL than data science.

Bio: Norman Niemer is the Chief Data Scientist at a large asset manager where he delivers data-driven investment insights.

He holds a MS Financial Engineering from Columbia University and a BS in Banking and Finance from Cass Business School (London).

Original.

Reposted with permission.

Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.

disqus.

com/embed.

js; (document.

getElementsByTagName(head)[0] || document.

getElementsByTagName(body)[0]).

appendChild(dsq); })();.

. More details

Post Views: 60

Leave a Reply Cancel reply

Related