I wasn’t getting hired as a Data Scientist. So I sought data on who is.

What I needed was… actual data on people who have successfully made the transition into data science    While there are some publicly available, large-scale surveys that have been conducted on who is a data scientist, I saw several problems with such data:There was no question in my mind that LinkedIn was where I needed to get the data from.

While there might still be some selection bias (LinkedIn’s algorithms might not be showing me a truly random sample of data scientists¹), I saw its widespread adoption by jobseekers and the recruitment industry alike as an inbuilt check to minimize respondent bias and ensure the truthfulness of its profiles.

LinkedIn profiles are subject, as it were, to the coercions of the actual job market.

In addition, LinkedIn allows me to specify the geography of profiles that I wished to analyze in my search query, limiting it to Singapore if so desired.

There was only one problem: getting the data itself.

   There has been some controversy surrounding the legality of scraping LinkedIn data.

While recent precedent establishes that such information is public and therefore amenable to extraction by anyone, the legal status is far from settled.

In any case, there are several roadblocks you will encounter when you try to scrape LinkedIn data:Suffice to say the scraper that I wrote remained useful for long enough to acquire a decently sized dataset (1027 LinkedIn profiles) before the tags were replaced and the code became outdated.

(If you’d like to find out more about the code nonetheless, feel free to reach out to me²).

Using the search query “Data Scientist AND Singapore”, I extracted as many profiles as I could from the People section of LinkedIn.

There were really only three data elements that I considered relevant: Current Position (job title and name of employer), Education (most recent institution and field of study) and Experience (position, organization, and duration of previous roles).

Limiting myself to these three elements not only saved time in writing and debugging the scraper but was also my attempt at minimizing the scope of potential liabilities from not adhering to LinkedIn’s terms of service.

After filtering out data science aspirants, students, and profiles with insufficient information, I was left with 869 data scientist profiles.

Now I can go about asking: what common traits do currently employed data scientists have?   The most striking finding from the data, and which has been corroborated elsewhere, is that most (73%) currently employed data scientists have degrees beyond just a Bachelor’s.

A plurality (44%) hold a Master’s degree, while Ph.

D.

s outrank Bachelor’s degrees 29% to 21%.

Only 6% of data scientists reported some form of MOOC, bootcamp or non-traditional certification as their primary qualification.

This suggests that prospective employers trust the signaling provided by an advanced degree to fulfill the complex requirements of the data scientist position.

 It also puts paid to the notion that data science bootcamps or other non-traditional certification programs are an adequate substitute for such degrees.

    The common conception of a triumvirate of Computer Science, Mathematics and Statistics, and Engineering disciplines forming the bedrock of a data science career is somewhat borne out by the data.

However, there are differences.

 Computer Science by far trumps all other singular fields, accounting for 14% of all studied disciplines.

 Engineering is a diverse category and includes such disparate fields as Chemical, Electrical and Electronics, and so-called Knowledge Engineering, and cumulatively accounts for 22% of studied disciplines.

 Mathematics and Statistics are also represented under various guises, including Applied Mathematics, Mathematical Physics, and Statistics and Applied Probability, but seem to carry less heft and cumulatively account for only about 12% of studied disciplines.

A surprising winner in the data science education stakes is Business Analytics and other Analytics fields, which collectively account for 15% of disciplines.

It is, in fact, the top-ranked field for data scientists who report having a Master’s degree as their highest qualification.

Other highly ranked fields include Physics (3.

5%) and Information Technology (2.

2%).

The picture that emerges is that while computing- and engineering-related fields have demonstrated continuing relevance for becoming a data scientist, mathematics and statistics are somewhat being eclipsed by the newer business-oriented field of Analytics (and its variants).

Nevertheless, a very long tail of other fields represents the broad diversity of disciplines that have been pursued by current data scientists.

    The modal years of reported work experience for a data scientist in this sample is between 4–6 years, depending on their highest level of qualification.

This may seem blindingly obvious, but it is perhaps worth repeating that most data scientist hires are not college graduates straight out of their heroic MOOC conquests, which sometimes seems to be the impression given by blog posts about how to break into the field.

As with most other open positions, the average person filling that position will probably be someone with experience.

As an additional fun fact, none of the data scientists reporting non-traditional certification programs were fresh hires, having at least 1 year of work experience prior.

    Another data point corroborating the above finding is that most data scientists (76%) have occupied their current positions for less than 2 years, with a plurality (42%) holding it for less than a year.

This suggests that while most data science job openings have been relatively recent, the people filling them up nevertheless have been in the job market for a while.

    Finding out what data scientists were doing immediately prior to their current positions was the core insight I wanted to get to.

Perhaps unsurprisingly (given the preponderance of postgraduate degree holders in the sample) a good chunk (11%) of them report being Scientists or Researchers (including Research Assistants and Research Fellows) previously.

An equivalent chunk (11%) reported some form of Software Engineering position, including developers and solution architects.

Another section of data scientists were previously Analysts (11%) in their various forms, including Data Analysts and System Analysts.

Interestingly, interns and trainees (11%) are also a viable class of precursors to a full-fledged data scientist role, and they typically take the form of Data Science or Analytics internships.

Other highly ranked previous positions include Consultancy (5%), various Managerial positions (5%), and Data Science Instructorship (3%).

Unremarkably, nothing beats already being a Data Scientist in attempting to land a new data science role.

Fully 28% of the sample reported Data Scientist as a previous position.

Furthermore, this incumbency advantage appears to be increasing — for example, 29% of hires who have been 1 year or less on the job reported Data Scientist as their previous position compared to only 12% of hires who have stayed between 3–4 years on the job.

For myself, it was worth noticing that Statisticians and Actuaries are at the bottom of the heap as a prior role for existing data scientists.

    While well-funded, mature technology companies (such as Google or Amazon) tend to get the limelight in terms of desirable places to get hired as a data scientist, it is worth noting that nearly half (49%) of data scientists in this sample came from places that do not directly create technology products.

These tended to be companies and institutions from finance and insurance (11%), consulting (9%), government (5%), manufacturing(5%), and academia (2.

4%).

Within the technology category, industries that are well-represented include transportation (8%, primarily due to Singapore-based ride-hailing app Grab), enterprise (8%, including IBM, SAP, and Microsoft), e-commerce (5%) and finance (5%).

Here we see the distinction between a financial institution like DBS Bank hiring for data scientists versus a fintech company like Refinitiv using data science to create technology products for such institutions.

There is a sizeable category of technology companies I have labelled as AI & ML (6.

5%).

This includes companies like DataRobot with a track record of delivering actual automated machine learning products, but also newer outfits like Amaris.

AI.

It would be far too convenient if this cleavage between non-technology and technology companies of data scientists neatly aligns with the Type A vs Type B data scientist characterization proposed elsewhere, as it suggests that the job market (at least in Singapore) has been pretty equitable in providing opportunities for either type.

Nevertheless, this would be an interesting and valuable hypothesis to test.

    If you are serious about landing a data scientist position, rather than fretting about what kind of skills you need from reading random blog posts, it is perhaps more helpful to get a sense of who exactly has been successful at it.

The most frequent combination of traits would probably be someone with a Master’s or Ph.

D.

in Computer Science, Engineering, Mathematics or Analytics; who’s been employed in industry for about 4–6 years; and was a Researcher, Software Engineer, Analyst or Data Science Intern in a previous life².

 However, don’t make the fallacy of thinking that it is thiscombination which constitutes the majority of data scientists, as it represents a multiplication of probabilities (which may themselves not be independent).

As this piece and other research have noted, the background of data scientists is incredibly diverse, more so than other kinds of positions such as Software Engineer.

Nevertheless, the picture that emerges is that certain profiles do tend to be favoured and the amount of ‘standing out’ that is expected of your resumé will probably be proportional to how much it deviates from such profiles.

Finally, I would note that while the data is silent on the necessity of skills acquired from non-traditional certifications such as MOOCs and bootcamps, it does suggest something about their sufficiency: they clearly aren’t.

A postgraduate degree is a far better indicator of your prospects as a data science hire.

This is not to suggest that acquiring such skills is unimportant; data science is moving at a rapid pace and many of the most important algorithms and techniques will not be covered by a conventional academic syllabus.

 It is merely to suggest that the acquisition of specific skills may be answering a need other than your immediate employability as a data scientist.

A myriad of specialized courses on data science has been proliferating that seem designed to prey on the insecurities of aspirants, who’ve been told again and again they need just that particular combination of skills to achieve a breakthrough.

Understanding data on who actually gets hired as a data scientist throws a cold, hard splash of reality to such existential considerations.

   ¹ If there is any reason to poke holes at the data, it would be to doubt the representativeness of the sample.

LinkedIn only displays profiles that have at least a 3rd-degree connection to you, and the profiles might have been sorted by a non-random algorithm (my scraper extracted the top profile results in order).

There is a case to be made that I am not optimally connected to obtain a truly random sample of data scientists from my target market (e.

g.

not having enough hub nodes in my network).

Getting more profiles from other LinkedIn accounts and doing a sensitivity analysis would throw more light on this question.

² All the visualizations in this piece (and more) have been put together in a Tableau Story called “Who is a Data Scientist in Singapore?”.

If you have any substantive questions about the data or code, do consider the response section of this post, or write an email and send it to admin@hanifsamad.

com.

UPDATE Aug 5, 2019: There seems to be a fair bit of interest in the code that I used to scrape the data.

I am currently working on a follow-up post to share this with those of you who are keen.

Stay tuned.

  Bio: Hanif Samad is a statistician, software engineer, data scientist in Singapore.

He is focused on the problems worth solving.

Original.

Reposted with permission.

Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.

disqus.

com/embed.

js; (document.

getElementsByTagName(head)[0] || document.

getElementsByTagName(body)[0]).

appendChild(dsq); })();.

. More details

Leave a Reply