They All Fall Down: The Most Common Pitfall All New Data Scientists Fall For

Generally, the people paying you.

Executives, clients and customers tend not to be data science experts and so you need to be able to explain things at a much more comprehensible level and in a way that you convey the important message.

This is sometimes something someone from a very academic background (I can vouch for this) can struggle with without practice.

Ask someone to explain a neural network and using the words “features” and “back propagation” without explaining them is not going to help and put peoples backs up (people don’t like to feel stupid).

Photo by NeONBRAND on Unsplash3.

Problem SolvingData science leans heavily on working out how to manipulate the problem you are given that you get a satisfactory solution.

You look at data, you understand its source and meaning, you then look at creating a solution that adequately covers the problem you have been set (understanding the exact problem is another art in itself).

For me this ability to solve a problem in an understandable and methodical way is vital.

By doing it you can work better with similar minded people in a team and you will tend to cover the bases.

To do this the best way to start is to ask questions and lots of them!Great.

So what’s the pitfall?Imagine I am your new boss, I tell you that I have gone through our data and we have built a model, but it does not work and that’s why they’ve hired you to fix it.

What do you do?Almost everyone will go through the code given to them and modify the model, maybe tweak some bits here and there, reading all the comments as they go.

This is where a new data scientist tends to fall down, they become obsessed with the code and the last few lines where the model is defined, but they neglect one of the most important sources of information, the boss.

You can ask them all sort of questions about what they did and covered, if you ask them enough maybe you might find that the data, they are using is not what they should be using.

That’s the pitfall, new data scientists are always given problems to solve where the data holds an answer.

This means they never consider if the data they are using is correct.

In the problem I set I will use two datasets, one that is used in production and another for training that is different.

This is supposed to represent the fact that they sourced the data from different areas (maybe one is made from asking all their employees a questionnaire and another is from actual customers) and are not representative, this is why the model works in development but does not in production.

Sounds implausible?.I’ve found this more than once in reality, where because no data was available at the beginning, they collected any old data and in an uncontrolled way that meant it was never representative of the final real data.

Take AwayIs the data the right data for the job?If you are new to data science, just consider sometimes where your data is coming from and if it is the right data for the problem.

If you took any of the newer data science courses, it is probably a problem you have never thought about before and it will literally give you a head start over your peers with equal experience.

Ask questions of everything, even the people.

Sounds silly, but have you thought about this yourself?Note: This is of course my own opinion and there are many schools of thoughts.

Just because this is what I find works does not mean it is the only solution or the best one.

.

. More details

Leave a Reply