Data Science Has Become Too Vague

Data Science Has Become Too VagueLet’s Specialize and Break it Up!Thomas NieldBlockedUnblockFollowFollowingJan 25I would not be opposed to downplaying the term “data science” and breaking it up into specialized disciplines.

Do not misunderstand, I think the global “data science” movement was necessary and had a positive impact on the curmudgeon corporate world.

But the campaign has been won and everybody is bought into the idea.

Rather than continuing to evangelize and hire under the “data science” banner, perhaps we should allow the dust to settle so people can adjust to the change.

Data science professionals, consider no longer burdening yourselves with the heavy title of “data scientist”.

Most of us do not have PhD’s or encyclopedic knowledge on every new topic.

Maybe we should specialize and relieve ourselves the pressure of having to know everything.

Data science has become too broad of a buzzword, and has become too ubiquitous and vague.

Why would anybody want to take ownership of something so nondescript?In this article, I want to highlight how “data science” has evolved and why it may be time to fragment it.

The Jabberwocky EffectIn 2010, there was a short-lived but memorable U.


TV series called Better Off Ted.

The show is a silly workplace comedy that lampoons corporate culture to a hyperbolic extent.

But one episode, Jabberwocky (Season 1 Episode 12), captures the corporate buzzword effect too accurately.

Ted, the lead character, tries to hide budget for a pet project.

When his boss Veronica confronts him, he lies and says the funds went to the revolutionary “Jabberwocky” project, which he vaguely makes up on the spot.

Here’s the funny part though.

Rather than clarify what “Jabberwocky” is, Veronica pretends to be “in the know” fearing to look incompetent for being out of the loop.

She pushes the nonexistent Jabberwocky project as top priority on the rest of the company.

With hilarious results, every leader and employee works on Jabberwocky having no idea what it is, but would never dare admit their ignorance to each other.

Blindsided by how far it escalated, Ted comes clean to Veronica right before they do a keynote on “Jabberwocky”.

Veronica tells Ted to proceed anyway because “products are for people who don’t have presentations”.

I probably do not have to explain the analogy that is “Jabberwocky”.

Replace that word with “Blockchain”, “Big Data”, “Bitcoin”, “Artificial Intelligence”, “Internet of Things”, “Quantum Computing”, “Machine Learning”, or “Data Science” and you know exactly what I mean.

Corporate culture has long had a history of hyping innovations and people pretending to understand them, only to encounter their limits and chase something else.

Now that I have highlighted the “Jabberwocky Effect”, let’s continue.

A Brief History of Data ScienceIf you want to define “data science” as anything that has to do with “data”, you can go back to the dawn of computing.

If you think math and statistics are crucial to data science just as much as data, you could go centuries back and say statisticians were the original “data scientists”.

For the sake of brevity, let’s go to the 1990’s.

Things used to be pretty simple.

Analysts, statisticians, researchers, and data engineers were all pretty separate roles with occasional overlap.

Tooling stacks often consisted of spreadsheets, R, MATLAB, SAS, and/or SQL.

Of course throughout the 2000’s things were changing.

Google pushed data collection and analytics to unimaginable heights.

In 2009, Google executives insisted statisticians will have the “sexiest job” for the next 10 years.

That was a decade ago, but I recall that being a strange sentiment.

But lo and behold, in 2011 “Harvard Business Review” mainstreamed this concept called “data science” and declared it the sexiest job of the 21st century.

It was at that moment the craze started in “Jabberwocky” fashion.

Harvard created a void called “data science” and everyone raced to fill it.

SQL developers, analysts, researchers, quants, statisticians, physicists, biologists, and a myriad of other professionals rebranded themselves as “data science” professionals.

Silicon Valley companies, feeling that traditional role titles like “analyst” or “researcher” sounded too limited, renamed the roles to “data scientist” which sounded more empowered and impactful.

Outside Silicon Valley, this added to the confusion as most folks think of “scientists” as PhD’s in white lab coats.

Counter-intuitively, data scientists actually come from many backgrounds (technical and nontechnical) with varying levels of education (BS, BA, MBA, and sometimes PhDs).

Many hiring managers, HR departments, and organizations in general struggled to define what they needed in a “data scientist”, which is why many of you probably have funny/sad anecdotes about a young data scientist getting thrown a MySQL database, but provided absolutely no direction.

Throw in scaling advancements in data engineering (think “big data”), as well as the rapid advancements of “machine learning”, then the “data science” umbrella gets larger and more vague.

More buzzwords are thrown around that many people are saying and yet few understand.

Before you know it, “big data” and “machine learning” have become synonymous, and distinction of disciplines becomes lost.

The domain of “Data Science” has been exhausted by the “Jabberwocky” effect.

If we want it to continue succeeding we need to specialize it, rather than continuing the increasingly confusing generalization.

Reasons to Dissolve “Data Science”The “data science” push did some great things.

It rejuventated old, grumpy businesses to do something fresh and exciting.

IT departments, who were traditionally stingy about giving access to data and allowing non-I.


staff to write code, were forced to evolve and support such initiatives.

Most importantly, it democratized technology to so many non-technology professions.

The idea that a lawyer can benefit from learning to code is not so fringe anymore, and the rite is no longer reserved for computer scientists, professional programmers, and engineers.

But this is a sign that the “data science” campaign has succeeded and ran its course.

Continuing to push it is starting to become detrimental.

Here are some reasons why:It is Too BroadNot too long ago, if you got a bachelors degree in “Business Management”, you could easily be upwardly mobile.

But today, conventional success often requires specializing and focusing in a specific area, simply because our world has gotten complicated.

A business student will be much better off studying finance, supply chain management, operations research, accounting, marketing, or some other specific business discipline.

I believe “Data Science” needs to go through a similar transition.

Like business itself, there are too many disciplines to expect total mastery.

It is unproductive to try learning all of them, especially at once.

Of course high-level awareness of what is out there is beneficial.

It is also healthy to change interests over time.

However, attempting to be omniscient will never yield value.

It has always bothered me that “data science” can be creating a chart in Excel or Tableau… as well as building and tuning a neural network classifier.

Seriously, what is up with that?.These two tasks are thousands of miles apart in their nature, the technical skill needed, and the salary.

Writing a SQL query versus building a Bayesian model?.These are also unrelated skillsets and definitely not interchangeable.

So why do we generalize people with these extremely diverse skills as “data scientists” and make hiring so vague and difficult?Some folks reading this may argue “well all these disciplines are interconnected and the discipline of “data science” helps unify and integrate them all.

” That’s arguable to some degree, but marketing, finance, supply chain, accounting, and other business functions are interconnected as well.

Despite a common objective, they still are distinct areas and we no longer put emphasis on the whole of “business management”.

Fragmentation and specialization is part of a domain maturing, and over time those get more attention than the domain itself.

It has always bothered me that “data science” can be creating a chart in Excel or Tableau… as well as building and tuning a neural network classifier.

Seriously, what is up with that?It is OverwhelmingOne of the things that prompted me to write this article is the growing number of articles from data scientists confessing their feelings of “imposter syndrome”.

There is this one which I’ve seen circulating.

There is also this one.

As time progresses, more data science professionals continue to come forward and confess their feelings of fraudulence.

Professionally, the burden of Imposter Syndrome can fill you with dread and keep you up at night.

The question always lingers “How long will it be until I’m discovered for the fraud I am?”But I believe this a symptom of the larger issue in this article.

It took me way too long to figure out that “data science” has become anything and everything related to “data”.

Sadly, there are folks that take it upon themselves to own all that.

Why anyone would want to is beyond me.

This is all you need to become a confident data scientist (as of 2013).

Totally achievable, right?The above graphic is a popular (but dated) roadmap to become a data scientist.

Not only is this impractical for folks with personal lives, but why is it prescribing a “one-size-fits-all” curriculum?.Maybe you can get shallow knowledge of every topic on there, but people work in different environments with different problems.

At a given point in time, why not learn the tools needed for your particular job?.Never mind also that tools and platforms come and go, and skills become legacy pretty quickly.

The only part of this roadmap not prone to obsolescence are classic mathematical concepts.

Do not misunderstand, it’s always good to be learning and obtain general ideas of what solutions exist.

But in the reality of day-to-day life, effective people know how to discern and prioritize, rather than be driven by FOMO.

It Saturated EverythingData is like electricity now.

It is used everywhere and for different purposes.

In the 19th century people would marvel at what electricity enabled.

Today, there is less attention on electricity and more on the devices it is powering.

It is not so much we take electricity for granted, but ya know, there just comes a point you stop celebrating it.

It is the same thing with data.

It has succeeded and became the new normal.

Rather than continuing our exhausted celebration, we should focus on the next innovations that it will enable.

Do you think natural language processing can create an opportunity to improve customer complaint handling?.Then push “natural language processing”, not “data science”, “machine learning”, or “AI”.

Be specific and focused.

Are you interested in optimizing profit, cost, revenue, or operational feasibility, then position yourself on optimization.

“Data science” has become white noise now and less practical as a term.

Focus on specific and tangible areas where problems are yet to be applied and solved.

The Buzzword DilemmaTo wrap up, here are a few final considerations.

I made it clear we should stop using the term “data science”.

Will that actually happen?.Sooner or later, I think it will.

Just like the term “cloud computing” has largely died and been replaced by specializations, I think a similar transition will happen in data science.

Am I going to follow my own suggestion?.I am not sure yet.

While the term stays in vogue, it may be the only way to get people to show up to my talks or read articles on this blog called Towards Data Science.

I cannot blame others for doing the same.

Ask yourself this also: do we use buzzwords to spur a positive change?.Or to serve our own purposes?.Again on a global scale, the “data science” buzzword has had a positive effect.

It democratized technology across professions and empowered many people for the digital workforce.

But I am sure there are folks calling themselves “data scientists” to exaggerate their capabilities and capitalize on the hype.

In summary, let’s ease off on generalizing people and roles.

Perhaps we should stop calling roles “Data Scientist” and instead make the role reflective of the tasks it entails.

Hire “Data Engineers”, “Operations Research Developers”, old school “statisticians”, and “Machine Learning Analysts”.

Give everyone a chance to find their niche and contribute individually in the best way they know how.

In time, organizations will shape themselves and align to their needs in ways that make sense.

Originally published at tomstechnicalblog.


com on January 19, 2018.


. More details

Leave a Reply