Top 7 Things I Learned on my Data Science Masters

No Problem.

Scrape one Yourself.

Use the Power of Python and BeautifulSoup to Scrape Data that Matters to You.

    Libraries are made for a reason.

 Google before you act.

I’ll show you a very trivial example of a ‘mistake’ you probably haven’t done, but it will help you to understand this point.

It’s about two ways to calculate the median.

Median is defined as: The middle of a sorted list of numbers.

So to calculate it you would have to implement the following logic:Luckily, there exist libraries such as Numpy, which does all the heavy lifting for you.

Just take a look at the code below, the first 17 rows refer to calculating the median by yourself, and the last two rows use the power of Numpy to achieve the same: As I said, this is only a trivial example you probably haven’t done yourself.

But just imagine how many lines of code you have written in vain because you didn’t realize that there’s already a library for that.

   Although not something data science specific, I would say that I use list comprehensions all of the time for stuff like feature engineering, and lambda functions for data cleaning and preparation.

Down below is a simple example of feature engineering.

Given a list of strings, you need to create a variable that will equal to 1 if the given string contains a question mark (?) and 0 otherwise.

You can see how you could achieve this with and without list comprehensions (hint: they are a massive time saver): And now for the lambdas, let’s say you have a list of phone numbers in a format you don’t like.

Basically you want to replace ‘/’ with ‘-’.

This is an almost trivial process, provided that your dataset is in Pandas DataFrame format: Take a moment to think about how you could apply those to your dataset.

Cool, right?   If you haven’t been living under a rock, you know the importance of statistics in data science.

It’s a fundamental skill you must develop.

Let me quote Edureka: Statistics is used to process complex problems in the real world so that Data Scientists and Analysts can look for meaningful trends and changes in Data.

In simple words, Statistics can be used to derive meaningful insights from data by performing mathematical computations on it.

[1] From what I’ve learned on my masters so far with regards to statistics, is that it is necessary for you to know it to be able to ask the right question.

If your stat skills are rusty, I would strongly suggest you check out StatQuest channel on YouTube, more precisely this playlist on the basis of statistics:    There’s no point in being able to ask the right question (see point 5) if you can’t deliver the solution — right?I’ve been guilty of neglecting algorithms and data structures because I thought that only software engineers should worry about those.

I was terribly wrong, to say at least.

Now I’m not saying that you must know in your sleep how to code out Binary search algorithm, but just a basic understanding will help you to see a clearer picture of how to think in code — ergo how to write code that gets the job done, but also get’s the job done as fast as possible.

For a person without a computer science background, I would strongly recommend this course:Learn Python for Data Structures, Algorithms & Interviews PLEASE NOTE: IF YOU ARE A COMPLETE BEGINNER TO PYTHON, CHECK OUT MY OTHER COURSE: COMPLETE PYTHON BOOTCAMP TO LEARN…  Also, make sure to check out the interview questions — they help, A LOT!   Always be that person who works the harders.

 It pays off.

At least in my case, my group was evaluated based on the initial performance on one of the classes.

It wasn’t about who knows the most, because it would be a stupid thing to do in the first semester, but it was about who will show work ethics and discipline.

As I wasn’t working full time then, I worked my ass off for this project.

Because I did, and others didn’t, I was appointed to a full-scale data science project, which will last for two years and will serve for my master’s thesis.

And yeah, I’m able to put that on my CV.

So, was sacrificing a couple of weeks of my personal life worth it?.Judge for yourself, but I would say that it was.

  References  Bio: Dario Radečić is a 22-year-old student of Data Science, who has also been working in the field for a while.

Writer at Medium and Towards Data Science.

Original.

Reposted with permission.

Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.

disqus.

com/embed.

js; (document.

getElementsByTagName(head)[0] || document.

getElementsByTagName(body)[0]).

appendChild(dsq); })();.

. More details

Leave a Reply