Being the first Data Science hire

Being the first Data Science hireRaúl VallejoBlockedUnblockFollowFollowingMar 29Full Concentration by Adrian SchiessThis post lands in the context of a company looking to use AI and hiring a person to “do data science”.

Click on the GIF to understand why Twitter is my new best friend.

“[Companies] don’t really have a full understanding of what they need.

There’s just a generalized sense of ‘we have data, it seems useful, but we don’t have anyone who has the skills to make it useful.

’ ”- Randy Au on succeeding as a data scientist in small companiesBusinesses in this situation can end up falling in one of two recruitment problems:Over-qualified hire for a senior position doing a junior’s jobHire with little experience for a junior role ending up doing a senior’s jobBrowse around some recruitment websites and you will find that most companies are following the general trend of posting entry-level positions that requiere unicorn-level skill sets.

Furthermore, the job descriptions of these roles bare a striking similarity to a chief data officer’s responsibilities:“… the chief data officer oversees a range of data-related functions that may include data management, ensuring data quality and creating data strategy.

He or she may also be responsible for data analytics and business intelligence, the process of drawing valuable insights from data.

”The first data scientist in the company will mostly allocate their time and energy among three activities: project planning, wearing different hats and finding some wins.

This post is for all the entry-level unicorns tasked with planning and managing data projects.

Project planningSuppose the data scientist has already had the chance to, slowly and tediously, go through the most important data sources of the company, has a basic understanding of the data collection process and has (mentally) mapped out the general information flow.

Given this understanding, laying out a data project plan can still be tricky.

When it comes to long term planning and goal setting, answering the following question seems like a near impossible task:Can this *huge project* be done by this date?Hearing this question for the first time is terrifying and giving a response is gut-wrenching.

You don’t want to commit to something you have no idea you can deliver.

Managing expectationsTalking to executives about data science is hard enough, answering questions like “when will this project be done?” or “can you make something that does this?” is absolutely nerve racking, especially if there is no one around with more experience in analytics.

It’s awesomely empowering to have the skills to tackle an entire project, but suddenly sh*t got real and you are the sole responsible for all outcomes on the project.

The key here is to breakdown the *huge project* that is expected to be done.

The data scientist must have all moving parts in mind.

From data ETL to product deployment.

However, this breakdown cannot stay in your head.

You must lay it out somehow to show executives how you are planning on working for the coming weeks and how it will all amount to what the company wants/needs.

The benefits of having done this is two-fold.

Executives will no longer have any doubt on how building a data product actually works and the data scientist will be able to sleep at night knowing that they have the green light to carry on their work.

Consider using project-planning or note-taking platforms to map out the most important stages of a project.

In my experience, Notion has been of great help to begin and is now the cornerstone to my personal productivity.

At this point, MVPs must become central to any long-term planning.

Why?.Because it is paramount to find some wins while on your way to a finished product.

Planning everything has really become a game-changer for me.

It helps me keep track of everything going on and to stay ontop of multiple ongoing projects.

Minimally Viable ProductBeing that most companies do not have the appropriate data infrastructure, the data scientist must literally start from scratch.

Monica Rogati talks about how to begin a new project: start at a vertical cross-section of the pyramid of needs, and then grow horizontally.

What does this look like in practice?.Start with the essentials.

Visualizations of the most important numbers in the business.

Look for the data and skip all the granular drill downs everyone is used to.

Leave those for the second iteration of the process.

Get the product working end-to-end.

Extract, transform and load the data, do some simple aggregations and display it in a way that it provides the big picture everyone already knows.

What have you gained?.Automation.

Now you can spend time doing more valuable work.

Second iteration: add some new columns to the data, incorporate some filters and new comparisons.

The data pipeline is already there and it works fine, now it’s just a matter of finding the columns you’re interested in with a few more lines of code.

Nothing fancy.

Making the data readily accessible and visible should be the main concern before jumping into AI and ML.

Every new iteration grows the pyramid and creates a more solid foundation for a reliable data flow.

Eventually, you will have enough dimensions to do some deep-dives and drill-downs.

This is where discovery and new insights come lo light.

Breaking down a projectHow to segment a project into little chunks or sprints is entirely up to the data scientist.

I will go over what has worked for me when setting deadlines and planning work weeks.

(inspired by dotData’s CEO’s post)This is an over-simplified process to help with project planningGenerally, most long-term projects have at least two of the product stages.

Ideally, each stage only involves one tool, a specific coding language or BI platform.

Also, some projects might skip some stages like app development or be limited to one-off visualizations.

It’s important to mention that at the very beginning, uncovering insights probably won’t be the top priority.

Making the data readily accessible and visible should be the main concern before jumping into AI and ML.

I hope that this is useful for anyone embarking on the initiative of implementing data science and for all those first DS hires!.

. More details

Leave a Reply