Here is What I’ve Learned in 2 Years as a Data Scientist

By Admond Lee, Micron Technology / AI Time Journal / Tech in Asia It has been 2 years ever since I started my data science journey.

 Boy, that was one heck of a roller coaster ride! There were many highs and lows, and of course, countless cups of coffee and sleepless nights.

I failed a lot, learned a lot, and of course, grew a lot as a data scientist along the journey.

Throughout my journey in these 2 years, from writing on Medium, speaking at meetups and workshops, sharing my experience on LinkedIn, consulting clients on data science projects, to the current stage of teaching data science in education, I find joy and fulfilment in sharing and teaching to help others in data science and make an impact.

At the end of the day, it all boils down to one simple fact — that I’m moving towards my mission — Making data science accessible to everyone.

If you’re interested, feel free to check my previous LinkedIn post on why I decided to transition from a data scientist to becoming a data science instructor — a.

k.

a teacher.

In this article, for the first time, I’ll consolidate everything that I’ve learned and condense all of these into 5 lessons that I’ve learned in 2 years as a data scientist.

If you’re just starting out in data science and wondering what to learn…Or you’re looking for a job in data science…Or you’re already working in data science space…I hope you’ll find these 5 lessons helpful to you as a data scientist!Enough of talking… Let’s get started!      One of the most profound questions that I’ve ever been asked by one of the great senior data scientists during my data science career: “Admond, what’s the story that we are gonna tell in the meeting later?” The first time I heard this question, I was stunned for a second.

He didn’t ask what slides I’d prepared.

He didn’t ask what I was gonna share.

He didn’t ask what results that I was gonna tell.

NONE.

To be honest with you, I didn’t even understand why he emphasized so much on telling stories, instead of telling facts that we already had.

Before I began to appreciate the importance of telling stories, I made tons of mistakes.

Either stakeholders didn’t understand what I was saying.

Or the insights couldn’t convince and motivate them to take action.

Once I decided to improve my storytelling skills…Once I started focusing on telling stories…Things changed, for real.

Stakeholders or non-technical bosses began to understand what I was delivering without bombarding them with technical jargons and results.

 They took action.

Facts tell, but stories sell.

If you want to be a good data scientist, focus on technical skills.

If you want to be a great data scientist, focus on storytelling skills.

   Want to learn storytelling skills? Learn from Vox.

Because they are the master of storytelling, like seriously.

They have always been able to explain complex issues or ideas in an engaging and understandable way.

If this is the first time you’ve heard of Vox, check out their YouTube video below.

Just observe how they explained societal phenomena and issues in the most intuitive storytelling way possible to understand.

And this is very important when it comes to presenting insights or delivering core message to your audience with great storytelling skills.

Vox — How wildlife trade is linked to coronavirus   Forget about having Kaggle-like data in your real working environment, because most of the time you won’t have clean data.

Or worse, sometimes you don’t even have data to begin with, or perhaps you’re just not sure where to get or query data because they are scattered everywhere.

Data collection and data integrity are one of the most important steps in any data science projects, yet a lot of junior data scientists might be oblivious to that.

The reality is that you need to know where to get your data based on business requirements and the existing data architecture.

You might breathe a sigh of relief after you’ve got the data, but this is where the hard part begins — data integrity.

You need to perform a thorough check on the data collected by asking hard questions and understanding from different stakeholders to see if the data collected makes any sense.

Without having right and accurate data in place at the first place, all of our data cleaning, EDA, machine learning models building, and deployment are simply a luxury.

   One of the most common questions for beginners in data science is this: “What are the skills that I need to learn when starting out in data science?” In my opinion, I think learning technical skills (programming, statistics etc.

) should be the priority when first starting out in data science.

Once we’ve a solid foundation in technical skills, we should focus more on building and improving our soft skills (communication, storytelling etc.

).

While this might seem a bit counter-intuitive to the normal ways of learning data science skills, I truly believe in this approach.

WHY? You see.

Data scientists are problem solvers.

We don’t just write some code, build some fancy machine learning models and call it a day.

From understanding a business problem, collecting and visualizing data, to the stage of prototyping, fine-tuning and deploying models to real world applications, all these steps require teamwork, communication and storytelling skills to work with team members, manage expectation with stakeholders and ultimately to drive business decisions and actions.

There is a famous quote: “ Without data you’re just another person with an opinion ” — W.

Edwards Deming To me, getting data is only the first step.

What’s more important is how you can use data to drive business decisions and actions to make a real impact.

Here is a slightly modified quote from me: “ Without storytelling skills you’re just another person with data ” You can perform the best data analytics in the world.

You can build the best machine learning model in the world.

You can also write the cleanest code in the world.

But if you can’t use your results to drive business decisions and actions to convince people to use what you’ve got, your results would only be residing in your PowerPoint slides without having any real impact.

Sad, but true.

   For most businesses — unless you’re working at some cutting-edge technology companies — fancy or complex models typically are not the first choice for analytics or predictions.

Your boss and stakeholders want to understand what’s going on behind your results.

Therefore, you need to be able to explain what’s going on behind your results.

For instance, what caused this anomaly to be detected? And why is that so? Does it make sense in the business context? Why is the prediction the way it is? What are the contributing factors to the prediction? Are our assumptions correct?From all those questions asked above, it essentially boils down to one simple question: “ What’s the pattern observed behind? ” Being able to understand what’s going on behind our models and results is crucial to drive business decisions by convincing stakeholders to take actions.

Huge enterprises simply can’t afford to deploy a blackbox model in the real world and let it run wild on the ground without understanding how it works or when it fails.

And this is exactly why we’re still seeing simple models are still being utilized in the current industry like decision trees and logistic regression models.

   I made this huge mistake when I was first starting out in data science.

I focused too much on code and errors but somehow lost sight of the big picture that was truly important — end-to-end pipeline integration in production and how the solution performed in real world.

In other words, I was too fixated with the technical part to the extent of over-optimizing my code and models without having a real impact in the overall project or business.

Unfortunately, I learned this the hard way.

Fortunately, I’m currently using what I’ve learned to always remind myself to see the big picture.

Hopefully, you’ll begin to realize the importance of seeing the big picture in your day-to-day work as a data scientist.

And the first step to do this is to first understand the business domain and the problems that you’re solving.

Be clear of what you or your team aims to achieve in a project and understand how your role could be a part of the big picture and how different small pieces of picture can work together as a whole for the common goals.

   Thank you for reading.

My data science journey definitely has been a tough one, but I enjoyed the ride and learned a lot along the way.

And I’m still learning each and every day.

I hope you found this article helpful to you in some ways and will apply the lessons here in your work as a data scientist.

Now that I’ve moved to become a data science instructor, you’d also expect more data science content from me in future to help you learn and get into this field.

Check out my other articles if you want to learn more about data science.

If you’re interested in learning how to go into data science, feel free to check out this article — How To Go Into Data Science — where I compiled and answered a list of common questions (or challenges) faced by beginners in data science with guidance.

I hope you enjoyed reading this article and I look forward to having you as part of the data science community.

Remember, keep learning and never stop improving.

As always, if you have any questions or comments feel free to leave your feedback below or you can always reach me on LinkedIn.

Till then, see you in the next post! ????   As a data scientist and data science instructor, Admond Lee is on a mission to make data science accessible to everyone.

He is helping companies and digital marketing agencies track and achieve marketing ROI with actionable insights through innovative attribution and data-driven approach.

His story and data science work have been featured by various publications, including KDnuggets, Medium, Tech in Asia, AI Time Journal and business magazines.

Besides, he has been invited to speak at various workshops and meetups.

With his expertise in advanced social analytics and machine learning, Admond aims to bridge the gaps between digital marketing and data science.

Check out his website if you want to understand more about Admond’s story, data science services, and how he can help you in marketing space using data science.

You can connect with him on LinkedIn, Medium, Twitter, and Facebook.

  Original.

Reposted with permission.

Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.

disqus.

com/embed.

js; (document.

getElementsByTagName(head)[0] || document.

getElementsByTagName(body)[0]).

appendChild(dsq); })();.

Leave a Reply