How to Become a (Good) Data Scientist – Beginner Guide

By Sciforce.

Sometimes when you hear data scientists shoot a dozen algorithms while discussing their experiments or go into details of Tensorflow usage you might think that there is no way a layman can master Data Science.

Big Data looks like another mystery of the Universe that will be shut up in an ivory tower with a handful of present-day alchemists and magicians.

At the same time, you hear about the urgent necessity to become data-driven from everywhere.

The trick is, we used to have only limited and well-structured data.

Now, with the global Internet, we are swimming in the never-ending flows of structured, unstructured, and semi-structured data.

It gives us more power to understand industrial, commercial or social processes, but at the same time, it requires new tools and technologies.

Data Science is merely a 21st-century extension of mathematics that people have been doing for centuries.

In its essence, it is the same skill of using information available to gain insight and improve processes.

Whether it’s a small Excel spreadsheet or 100 million records in a database, the goal is always the same: to find value.

 What makes Data Science different from traditional statistics is that it tries not only to explain values, but to predict future trends.

In other words, we use Data Science for:Data Science is a newly developed blend of machine learning algorithms, statistics, business intelligence, and programming.

This blend helps us reveal hidden patterns from the raw data, which in turn provides insights into business and manufacturing processes.

To go into Data Science, you need the skills of a business analyst, a statistician, a programmer, and a Machine Learning developer.

Luckily, for the first dive into the world of data, you do not need to be an expert in any of these fields.

Let’s see what you need and how you can teach yourself the necessary minimum.

Business IntelligenceWhen we first look at Data Science and Business Intelligence, we see the similarity: they both focus on “data” to provide favorable outcomes, and they both offer reliable decision-support systems.

The difference is that while BI works with static and structured data, Data Science can handle high-speed and complex, multi-structured data from a wide variety of data sources.

From the practical perspective, BI helps interpret past data for reporting, or Descriptive Analytics and Data Science analyzes past data to make future predictions in Predictive Analytics or Prescriptive Analytics.

Theories aside, to start a simple Data Science project, you do not need to be an expert Business Analyst.

What you need is to have clear ideas of the following points:As you can see, at the very beginning of your journey, your curiosity and common sense might be sufficient from the BI point of view.

In a more complex production environment, there will probably be separate Business Analysts to do insightful interpreting.

However, it is important to have at least dim vision of BI tasks and strategies.

ResourcesWe recommend you to have a look at the following introductory books to feel more confident in analytics:Introduction To The Basic Business Intelligence Concepts — an insightful article giving an overview of the basic concepts in BI;Business Intelligence for Dummies —step-by-step guidance through BI technologies;Big Data & Business Intelligence — an online course for beginners;Business Analytics Fundamentals — another introductory course teaching the basic concepts of BI.

Statistics and probabilityProbability and statistics are the basis of Data Science.

Statistics is, in simple terms, the use of mathematics to perform technical analysis of data.

With the help of statistical methods, we make estimates for further analysis.

Statistical methods themselves are dependent on the theory of probability, which allows us to make predictions.

Both statistics and probability are separate and complicated fields of mathematics.

However, as a beginner data scientist, you can start with 5 basic statistics concepts:Image credit: unsplash.

comResourcesWe have selected just a few books and courses that are practice-oriented and can help you feel the taste of statistical concepts from the beginning:Practical Statistics for Data Scientists: 50 Essential Concepts — a solid practical book that introduces essential tools specifically for data science;Naked Statistics: Stripping the Dread from the Data — an introduction to statistics in simple words;Statistics and probability — an introductory online course;Statistics for data science — a special course on statistics developed for data scientists.

ProgrammingData Science is an exciting field to work in, as it combines advanced statistical and quantitative skills with real-world programming ability.

Depending on your background, you are free to choose a programming language to your liking.

The most popular in the Data Science community are, however, R, Python, and SQL.

ResourcesThere are plenty of resources for any programming language and every level of proficiency.

We’d suggest visiting DataCamp to explore the basic programming skills needed for Data Science.

If you feel more comfortable with books, the vast collection of O’Reilly’s free programming ebooks will help you choose the language to master.

Image credit: unsplash.

comMachine Learning and AIAlthough AI and Data Science usually go hand-in-hand, a large number of data scientists are not proficient in Machine Learning areas and techniques.

However, Data Science involves working with large amounts of data sets that require mastering Machine Learning techniques, such as supervised machine learning, decision trees, logistic regression, etc.

These skills will help you to solve different data science problems that are based on predictions of major organizational outcomes.

At the entry-level, Machine Learning does not require much knowledge of math or programming, just interest and motivation.

The basic thing that you should know about ML is that in its core lies one of the three main categories of algorithms: supervised learning, unsupervised learning and reinforcement learning.

With these broad approaches in mind, you have a backbone for analysis of your data and explore specific algorithms and techniques that would suit you the best.

ResourcesSimilarly to programming, there are numerous books and courses in Machine Learning.

Here are just a couple of them:Deep Learning textbook by Ian Goodfellow and Yoshua Bengio and Aaron Courville is a classic resource recommended for all students who want to master machine and deep learning.

Machine Learning course by Andrew Ng is an absolute classic that leads you through the most popular algorithms in ML.

Machine Learning A-Z™: Hands-On Python & R In Data Science — a Udemy course specifically for novice data scientists that introduces basic ML concepts both in R and Python.

Now you know the main prerequisites for Data Science.

Does it make you a good data scientist?.While there is no correct answer, there are several things to take into consideration:Analytical Mindset: it is a general requirement for any person working with data.

However, if common sense might suffice at the entry-level, your analytical thinking should be further backed up by statistical background and knowledge of data structures and machine learning algorithms.

Focus on Problem Solving: when you master a new technology, it is tempting to use it everywhere, However, while it is important to know recent trends and tools, the goal of Data Science is to solve specific problems by extracting knowledge from data.

A good data scientist first understands the problem, then defines the requirements for the solution to the problem, and only then decides which tools and techniques are the best fit for the task.

Don’t forget that stakeholders will never be captivated by the impressive tools you use, only by the effectiveness of your solution.

Domain Knowledge: data scientists need to understand the business problem and choose the appropriate model for the problem.

They should be able to interpret the results of their models and iterate quickly to arrive at the final model.

They need to have an eye for detail.

Communication Skills: there’s a lot of communication involved in understanding the problem and delivering constant feedback in simple language to the stakeholders.

But this is just the surface of the importance of communication — a much more important element of this is asking the right questions.

Besides, data scientists should be able to clearly document their approach so that it is easy for someone else to build on that work and, vice versa, understand research work published in their area.

As you can see, it is the combination of various technical and soft skills that make up a good data scientist.

 Original.

Reposted with permission.

Bio: SciForce is a Ukraine-based IT company specialized in development of software solutions based on science-driven information technologies.

We have wide-ranging expertise in many key AI technologies, including Data Mining, Digital Signal Processing, Natural Language Processing, Machine Learning, Image Processing and Computer Vision.

Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.

disqus.

com/embed.

js; (document.

getElementsByTagName(head)[0] || document.

getElementsByTagName(body)[0]).

appendChild(dsq); })();.. More details

Leave a Reply