Use Kaggle to start (and guide) your ML/ Data Science journey — Why and How

Use Kaggle to start (and guide) your ML/ Data Science journey — Why and HowNityesh AgarwalBlockedUnblockFollowFollowingAug 22 often get asked by my friends and college-mates — “How to start Machine Learning or Data Science”.So, here goes my answer..Earlier, I wasn’t so sure..But now, as I am going deeper and deeper into the field, I am beginning to realise the drawbacks of the approach that I took.So, in hindsight, I believe that the best way to “get into" ML or Data Science might be through Kaggle.In this article, I will tell you why I think so and how you can do that if you are convinced by my reasoning.(Caution: I am a student. I am not a Data Scientist or an ML engineer by profession. I am definitely not an expert at Kaggle. So, take my advice/opinions with a healthy grain of salt. 🙂 )But first, let me introduce Kaggle and clear some misconceptions about it.You might have heard of Kaggle as a website that awards mind-boggling cash prizes for ML competitions.Competitions hosted on Kaggle with the maximum prize money (yes those are MILLION DOLLAR+ prizes!)It is this very fame which also causes a lot of misconceptions about the platform and makes newcomers feel a lot more hesitant to start than they should be.(Oh and don’t worry if you have never heard of Kaggle before and therefore, don’t share any of the below mentioned misconceptions. This article will still make complete sense. Just treat the next section as me introducing Kaggle to you.)The misconceptions:“Kaggle is a website that hosts Machine Learning competitions”This is such an incomplete description of what Kaggle is!.no more passive reading through hours of learning material!All of these together have made Kaggle much more than simply a website that hosts competitions..“I should do a few more courses and learn advanced Machine Learning concepts before participating in Kaggle competitions, so that I have a better chance of winning” —The most important part of machine learning is Exploratory Data Analysis (or EDA) and feature engineering and not model fitting..EDA is probably what differentiates a winning solution from others in such cases.Now, let’s move on to why you should use Kaggle to get started with ML or Data Science..Why should you get started with Kaggle?Reason #1 — Learn exactly what is essential to get startedThe Machine Learning course on Kaggle Learn won’t teach you the theory and the mathematics behind ML algorithms..Its called — “How (and why) to start building useful, real-world software with no experience”. So, check that out if you haven’t 🙂 )I believe that this is also true for courses and tutorialsBut this idea totally fails when you don’t have a project to leap towards. And doing an interesting project is difficult because..a) is difficult to find an interesting ideaAnd finding ideas for Data Science projects seems to be even more difficult because of the added requirement of having suitable datasets.b) ..I don’t know what to do about those gaping holes in my knowledgeSometimes when I have started some project, it feels like there are just so many things that I still don’t know. I feel like I don’t even know the prerequisites for learning the prerequisites to build this thing. Am I just out of my depth? How do I go about learning what I don’t know? And that’s when all the motivation starts to wane away.c) ..I am just “stuck” more often than notIt seems like I keep hitting one roadblock after the other during the building process..After the competitions, it is common for the winners to share their winning solutions” (as written in the article, “Learning From the Best”)Reason #3 — Real data to solve a Real problem => Real motivationThe challenges on Kaggle are hosted by real companies looking to solve a real problem that they encounter..This means that you get to learn Data Science/ ML and practice your skills by solving real-world problems.If you have tried competitive programming before, you might relate to me when I say that the problems hosted on such websites feel too unrealistic at times..And that’s what you can get from participating in a Kaggle challenge.The Other Side of the debate: “Machine Learning isn’t Kaggle competitions”I will be remiss to not mention the other side of this debate which argues that Machine Learning isn’t Kaggle competitions and that Kaggle competitions only represent a “touristy sh*t” of actual Data Science work.Well, maybe that is true..Maybe real data science work doesn’t resemble the approach one takes in Kaggle competitions..I haven’t work in a professional capacity, so I don’t know enough to comment.But what I have done, plenty of times, is use tutorials and courses to learn something..I would learn something just because it is there in the tutorial/course and hope that it comes of use in some distant, mystical future.On the other hand, when I’m doing a Kaggle challenge, I have an actual need to learn..Cover the essential basicsChoose a language: Python or R.Once you have done that, head over to Kaggle Learn to quickly understand the basics of that language, machine learning and data visualisation techniques.Courses on Kaggle LearnStep 2.. More details

Leave a Reply