Let’s do data science” to all of these, then there are a few additional things to anticipate.
More Complex Development CyclesYour development cycles are about to change.
They will get longer and more complex as you add resources and the unique processes that come with data science development.
Data science teams are tasked with data collection/sourcing, discovery, cleaning/processing, training, and sometimes deployment.
Your initial integration of these new data scientists and your existing development teams is extremely important.
Many data scientists and executives with new data science teams are shocked at the time spent on data the first three steps — collection, discovery, and cleaning.
Upwards of 90% of a data scientist’s time is spent on these tasks — rather than model creation, testing, and deployment.
So what is one to do?Credit: Microsoft — https://docs.
microsoft.
com/en-us/azure/machine-learning/team-data-science-process/overviewIf you are stretched for data scientists — employ data analysts and enable them with the tools they will ask for such as Alteryx, DataRobot or Knime.
Good data analysts can use these solutions experiment with creating data pipes, discovery, cleaning and testing generic models.
This early work by your analysts can greatly accelerate your build time and be a budget-friendly solution where data scientists would otherwise spend mountains of time on prior to model creation and testing.
Ongoing MaintenancePlan for near-immediate and model degradation and therefore higher than usual ongoing maintenance cost.
A machine learning model evolving in relation to the world it touches — it needs to be maintained to be in tip-top shape.
As you do with new features for non-data projects- account for higher than usual upfront maintenance and a long tail for ongoing.
If you have to manually label training data or have a manual collection process- don’t forget that this will be part of maintenance too as you will likely need to update your models with this new information.
Getting a handle on the complexities of data scienceIf you don’t have a data science background — that’s ok.
My top recommendation is to immerse yourself in the world of data science, even if for a short time.
This will give you context and better insight into the challenges your data team will face.
A few paths I recommend:Take MOOC’s — brush up on your statistics first before diving into machine learning.
If you don’t have a software development background — use tools that help you accelerate your learning — Alteryx, Knime, RapidMiner, and DataRobot are all awesome tools at varying price points (Knime has the most usable free option).
Do a Kaggle or DrivenData competition using one of the tools above.
These can take as little as 2 hours and are a great way to learn by doing.
If you want to see the power of deep learning with one of these competitions — try using H2O’s Driverless AI solution, it is incredible.
Thoughts on additional challenges of a data product manager?.Let me know!.Need help with a data project you’ve been thinking about?.Tweet or at msg me on Linkedin.
Cheers!.