Busted! 11 Myths Data Science Transitioners Need to Avoid

Simply put – an artificial intelligence project has a universe of jobs attached to it.

It isn’t only limited to the role of a data scientist.

  Breaking down the myth Applied artificial intelligence is a complex field.

It requires working with different disciplines across the length and breadth of the project.

A plethora of interdisciplinary roles exist: Data Engineer Data Analyst AI/ML Engineer Data Scientist Statistician Business Analyst Domain expert (for example, a self-driving car project will have mechanical engineers and car hardware experts) IoT Specialist Data Science Manager/Decision-Maker Decision Scientist Researcher Software Engineer UX Designer Project Manager Note that the roles and number of folks staffed will vary depending on the project.

The idea I’m trying to get across is that AI isn’t a cut and dry field.

It’s not a straightforward path.

If someone tries to sell you on a project that is staffed with just data scientists, it might be time to sound the alarm bells.

This is especially relevant for people operating in a senior role (team leaders, managers, CxOs, etc.

).

It’s VERY important to understand each role in order to create a successful project.

I recommend taking the below course to fully grasp how an AI project works.

This includes how to hire the perfect AI team, and other intricate details every AI leader (or even enthusiast) should know: Artificial Intelligence and Machine Learning for Business You can also read about the jobs that might be impacted as AI continues to grow.

  9.

Data Science is Only About Building Predictive Models Being able to predict an event is a powerful thing.

And that’s what stands out to newcomers in data science.

Building models that can predict what a customer will buy next sounds like a must-have skill, right?.In fact, when I describe data science or machine learning to a non-technical person, their first reaction is quite similar.

The hype around this field is unprecedented.

Apparently, a data scientist is only building predictive models all day at work.

Is that not what DJ Patil meant when he described the role of a data scientist as the “sexiest job of the 21st century”?.Well, not quite.

  Breaking down the myth There are multiple layers in a data science project.

The model building part is just a speck in the overall data science lifecycle (I’ll cover the different roles in data science in the next section).

To give you a general idea, the steps involved in a typical data science lifecycle are: Understanding the problem statement Hypothesis building Data collection Verifying the data Data cleaning Exploratory analysis Designing the model Testing/Verifying the model If an error is found, head back to the verification or cleaning stage Putting it into production (deploying the model) Nothing is as straightforward as they teach you in a classroom or a course.

Experience is the best way to learn how a project works.

Try talking to someone who has seen the end-to-end process.

Even better, get an internship and get a first-hand account of what makes a data science project tick.

Additionally, data science isn’t limited to simply making predictions.

I’m sure you’ve come across the market-basket analysis concept.

It’s a combination of clustering techniques and association rules.

Or how about anomaly detection?.The ability to figure out outliers in the data.

There is so much to learn!.  10.

Participating in Data Science Competitions Translates to Real-Life Projects Data science competitions are an excellent stepping stone in your data science journey.

You get to practice your skills on a dataset, showcase it to the world, and even stand a chance to win prizes.

These hackathons and competitions have increased multi-fold in the last 4-5 years as more and more people want a piece of the data science cake.

Most aspiring data science professionals include these competitions on their resume.

The problem from an interview standpoint?.Recruiters have started paying less and less attention to this aspect of your portfolio.

  Breaking down the myth There are plenty of reasons why recruiters don’t consider your competition experience.

I’ll laser that down to one: Real-world projects are an entirely different beast compared to what you see in competitions.

Data science competitions have clean and almost spotless datasets.

If there are missing values, you can impute them using a plethora of techniques.

What matters is the accuracy of your model, not the way you got there.

Real-world projects have end-to-end pipelines which involve working with a bunch of people.

Most of us will always have to work with messy and untidy data.

 The old saying about spending 70-80% of your time just collecting and cleaning data is true.

Tasks like data cleaning and feature engineering will take up the majority of your time.

This LinkedIn post is an excellent read on the standard methodology one can use for analytical models.

You can also refer to the section above where we spoke about the different stages involved in a typical data science project.

                                                                                                                                          Source: Revolution Analytics Additionally, we can’t just build a stacked complex ensemble model.

Clients demand transparency so the simpler model usually wins out.

Interpretability is key in a corporate setting.

The project is accountable for the model behaving poorly.

As I mentioned in this article, getting a good score on a competition leaderboard is excellent for measuring your learning progress, but interviewers will want to know how you can optimize your algorithm for impact, not for the sake of increasing accuracy.

Talk to data science experts, try to understand how these projects work, build your network in the domain of your choice, and try to structure your thoughts to align accordingly.

  11.

Data Collection is a Breeze, the Focus should be on Building Models We’ll wrap up this article with another myth around building models.

This is a conversation I had with a fresher data scientist recently: Pranav Dar: What’s your favorite part of a data science role apart from designing models?.Fresher DS: I like the feature engineering part.

PD: Sounds fair.

How do you usually collect data for your projects?.Fresher DS: Um, I usually just download it from one of the open-source platforms.

PD: OK.

but what if the data is skewed or biased?.How do you verify the identity of the data?.And what will you do when you’re asked to collect data from multiple sources that require database skills?.Fresher DS: I hadn’t thought about that.

That, unfortunately, is a conversation I have on a far too regular basis.

Most experienced data science professionals are well aware of this situation as well.

Expect to be tested on this subject thoroughly in an interview.

  Breaking down the myth Data is being generated at an unprecedented pace but collecting and cleaning it isn’t getting any easier.

Without building a pipeline to collect the data, your data science project is going nowhere.

Typically, this is the role of a data engineer (but data scientists are expected to know this function as well).

I cannot overstate the importance of the data collection step.

Collecting honest and accurate data is imperative to your final model working well.

As Wikipedia puts it, “The goal for all data collection is to capture quality evidence that allows analysis to lead to the formulation of convincing and credible answers to the questions that have been posed”.

There are too many sources of data available.

How do you connect to each?.What data format do you receive from each?. More details

Leave a Reply