What Does an Ideal Data Scientist’s Profile Look Like? — Findings from Analyzing 1000 Indeed Job Postings

Because of the broad nature of the Data Scientist profession, other languages also play import roles.In summary, the top languages for Data Scientists are: Python, SQL, Scala, Lua, Java, SAS, R, C++ and Matlab.Languages Required for Machine Learning Engineers are More DiversePython as the de facto language of Machine Learning comes unsurprisingly as the top language for Machine Learning Engineers..Python is important, but still loses to Scala and Java since these languages help Data Engineers handle big data.In summary, top languages for Data Engineers are: SQL, Scala, Java, Python and Lua.Scala is Emerging as the Second Most Import Language in Data Science (not R)When we examine across different roles, interestingly, Scala comes up as either second or third. So we can say the top three languages in Data Science are Python, SQL and Scala. If you are thinking of learning a new language, consider Scala!Spark is the Top Big Data Skill Except for Data EngineersFor Data Engineers only, Hadoop is mentioned a bit more than Spark, but overall, Spark is definitely the big data framework one should learn first. Cassandra is more important for engineers than scientists, while Storm seems to be only relevant for Data Engineers.In summary, the top Big Data technologies for data science are: Spark, Hadoop, Kafka, Hive.TensorFlow is the King When It Comes to Deep LearningDeep Learning frameworks are hardly mentioned in Data Engineer job postings, thus it appears DL frameworks are not required for this role. The most mentions of DL frameworks come from Machine Learning Engineer roles, indicating ML Engineers do deal with Machine Learning modeling a lot, and not just model deployment. Furthermore, TensorFlow is definitely dominating in the deep learning field. Although Keras as a high-level Deep Learning framework is really popular for Data Scientists, it’s almost irrelevant for Machine Learning Engineer roles, probably indicating ML practitioners mostly use lower level frameworks such as TensorFlow.In summary, the most important Deep Learning frameworks in Data Science are: TensorFlow, Torch, Caffee, and MXNet.AWS dominates across the boardComputer Vision is Where Most of the Demand Comes from in Machine LearningFor general Data Scientists, Natural Language Processing is the biggest ML application area which is followed by Computer Vision, Speech Recognition, Fraud Detection and Recommender Systems. Interestingly, for Machine Learning Engineers, the biggest demand comes from Computer Vision only, with Natural Language Processing as the remote second. On the other hand, Data Engineers are again the focused specialists — none of these ML application areas are relevant for them.Insight — If you want to become a Data Scientist, you can choose various types of projects to build to show your expertise based on the area you want to get into, but for Machine Learning Engineers, Computer Vision is the way to go!When It Comes to Visualization, Tableau is a MustVisualization tools are mostly demanded for Data Scientists, and gets very few mentions for both Data Engineers and Machine Learning Engineers. However, Tableau is the top choice for all the roles. For Data Scientists, Shiny, Matplotlib, ggplot and Seaborn seem to be equally important.Git Is Important for Everyone, While Docker is Only for EngineersNext, we use word clouds to explore the most frequent keywords for each role and combine with the corresponding skills to build the ideal profiles for all the Data Science roles!Data Scientist is More about Machine Learning than Business or AnalyticsData Scientist has been regarded as the all-around profession that requires statistics, analytics, machine learning and business knowledge. It seems that’s still the case, or at least, there are still various needs in a Data Scientist. However, it definitely seems now Data Scientists are more about Machine Learning than anything else.Other top requirements include:business, management, communication, research, development, analytics, product, technical, statistics, algorithm, models, customer/client and computer science.Machine Learning Engineers are about Research, System Design and BuildingCompared with general Data Scientists, Machine Learning Engineers definitely seem to have a more focused portfolio which includes research, design and engineering. Clearly solution, product, software and system are the dominating theme. Accompanying those, there are research, algorithm, ai, deep learning and computer vision. Interestingly, terms such as business, management, customer and communication also seem to be important. This can be further investigated in a further iteration of this project. On the other hand, pipeline and platform also stand out, confirming common understanding of Machine Learning Engineer’s responsibility in building data pipelines to deploy ML systems.Data Engineer Is the Real SpecialistData Engineers have an even more focused portfolio than Machine Learning Engineers. Clearly, the focus is to support product, system and solution through designing and developing pipelines. Top requirements include technical skills, database, built, testing, environment, and quality. Machine learning is also important, possibly because the pipelines are mainly built to support ML model deployment data needs.That’s it! I hope this project helps you understand what employers are looking for, and most importantly helps you make informed decisions about how to customize your resume and what technologies to learn! If you like the post, I would appreciate your claps, thank you!P.S. I’ll write about the technical details in separate posts, so please stay tuned, more is coming :)Let’s connect on LinkedIn!. More details

Leave a Reply