Top 5 Must-Read Answers – What does a Data Scientist do on a Daily Basis?

Contrary to popular belief, Data Science is not all glamour.

The following survey results by CrowdFlower accurately sum up a typical day for a Data Scientist: There is a lot of backtracking involved.

Sometimes you even need to be able to predict what consequences removing/adding a variable might have.

Collecting Datasets: Data is the lifeline of Data Science, so we spend plenty of time curating it.

On rare occasions, some projects might already have plenty of data Cleaning & Organizing Data: This is the most time consuming and crucial step in the entire process.

It has a great impact on the final results.

Usually, after this step, the once large amount of data reduces and so we may need to collect more data for effective training Data Mining: It is the practice of examining large pre-existing databases in order to generate new information.

Once data is organized and stored in databases, we can finally begin to derive value from it by finding patterns within the data Building Training Sets & Test Sets: Once we have a decent amount of data, we need to split it into the training set and the test set.

A training set is a set of data used to discover potentially predictive relationships.

It contains all the information about the expected output.

A test set is a set of data used to assess the strength and utility of a predictive relationship.

It contains mixed variables Refining Algorithms: We start with a skeletal algorithm.

It is very basic and defines roughly what output is expected.

After a few sessions, the accuracy, precision, etc.

are recorded and the algorithm is refined to maximize its efficiency   Data Scientist Perspective from a Small-Sized Company – Justin Fister This is a superb answer and one I can relate to.

Note that machine learning, the most anticipated aspect of a data scientist’s job, only occupies 5% of the total time!.Just like Vinita, he has also explained his tasks in terms of percentage.

Here is Justin’s view: NLP-related tasks (15%).

It’s no surprise that PaperRater’s automated proofreading technology requires heavy use of parsers, taggers, regular expressions, and other NLP goodies as part of the core algorithms and feedback modules Machine learning (5%).

This tends to be the most enjoyable part.

Data cleaning, feature extraction/engineering/selection, and model building Reporting and analytics (10%).

Running queries, reviewing analytics, and assisting with strategic decision making Data management (5%).

Setting up and managing database servers including MySQL, Redis, and MongoDB.

Larger projects may require Hadoop or Spark General software development (40%).

Many data scientists’ have a background in computer science, so expect to pitch in if you have an applicable background.

API integration, web-development, and wherever else I can add value.

Even at an AI startup, most of the development is not going to involve AI Other (25%).

This includes a wide variety of tasks, including blog posts, marketing, management, technical documentation, technical support, website copy, emails, meetings, etc.

  The “Data Scientist” is a bit of a Myth – Tim Kiely The author, Tim Kiely, uses a Venn diagram to explain what data science is.

Just take a look at this Venn diagram below – it will blow your mind.

Tim additionally talks about what data scientists are supposed to be by taking a somewhat contradictory view of the general definition.

Here is Tim’s answer: The “Data Scientist” is a bit of a myth, in my opinion.

Not to say they aren’t out there but they are far rarer than is popularly understood and are more of the exception than the rule.

I liken it to the “Web Master” title of the dot-com bubble – these supposed people who could do full stack programming, front end development, marketing, everything.

All of those roles/skills were always specialized and remain so today.

“Data Scientists” are supposed to be database architects, understand distributed computing, have a deep understanding of statistics AND some area of business or field expertise.

That’s asking a lot when any one of those skill sets can take a career to build.

The Data Scientists I’ve worked with typically have a Ph.

D.

in A.

I.

or Machine learning and are effective communicators, which gives them the ability to direct the analysts, DevOps people, programmers and DBA’s at their disposal to solve problems with data-driven solutions.

They outline the desired solution and leave it to their teams to fill in the gaps.

  Machine Learning Engineer Working on NLP Tasks – Evan Pete Walsh Let’s drill down into a particular specialization of machine learning.

One of my favorites – Natural Language Processing (NLP)!.I wanted to bring out a machine learning engineer’s view here (a role every data scientist should become familiar with).

Check out Evan’s full response:   Currently working on NLP, for the most part, including intent classification and entity extraction.

Here’s a typical day for me: Get to work, pull up GitHub and check on the ZenHub board (kind of like Jira, except way cooler).

I had some models that were training last night on our servers and I should have gotten an email that they finished.

I did!.I’ll probably spend a few minutes testing those new models and then tweak some parameters, then restart the training process The rest of the day I’m usually head-down coding, either working on a back-end Python application that will supply the AI for one of our products, or implementing a new algorithm that I want to try out For example, recently I read a paper on coupled simulated annealing (CSA), and I wanted to try it out on tuning the parameters for XGBoost as an alternative to a grid search.

CSA is a generalized form of simulated annealing (SA), which is an algorithm for optimizing a function that doesn’t use any information on the derivative of the function Unfortunately, I couldn’t find an implementation in Python, so I decided to write my own.

Two days later, I had submitted my first package to PyPI!.  End Notes The data scientist role is truly multi-faceted, isn’t it?.A LOT of aspiring data scientists assume that they will primarily be building models all day long but that simply isn’t the case.

There are all sorts of tasks involved in a typical data science project which you’ll find yourself working on day-to-day.

I quite like that because it opens up avenues to learn new concepts and apply them in the real world.

I’ll be posting some more career-related articles on Analytics Vidhya, so stay tuned and keep learning!.You can also read this article on Analytics Vidhyas Android APP Share this:Click to share on LinkedIn (Opens in new window)Click to share on Facebook (Opens in new window)Click to share on Twitter (Opens in new window)Click to share on Pocket (Opens in new window)Click to share on Reddit (Opens in new window) Related Articles (adsbygoogle = window.

adsbygoogle || []).

push({});.. More details

Leave a Reply