Exclusive Interview with Sonny Laskar – Kaggle Master and Analytics Vidhya Hackathon Expert

Sonny is a MBA post-graduate from IIM Indore, the place he credits for starting his data science journey.

So for any of you wondering if it’s possible to make a career transition to data science from a non-data science field – this article is for you.

I found Sonny to be a very approachable person and his answers, as you’ll soon see, are very interesting, knowledgeable and rich with experience.

Despite holding a senior role in the industry, Sonny loves taking part in data science competitions and hackathons and regularly scales the top echelons of competition leaderboards.

Sonny also holds a lot of experience in the data engineering side of this field.

As you can imagine, there is a LOT we can learn from him.

I had the opportunity to pick his brain about various data science topics and bring this article to you.

  We covered a variety of data science topics during our conversation: Sonny’s background and his first role in data science The difference between data science competitions and industry projects Sonny’s framework and approach to data science competitions His advice to aspiring data scientists And a whole lot more!.There is SO much to learn from Sonny’s knowledge and thought process.

Enjoy the discussion!.  Sonny Laskar’s Background and First Role in Data Science Pranav Dar: You are currently the Associate Director of Automation and Analytics at Microland, finished 4 times in the top 3 in AV’s hackathons, and hold a runner-up finish in a Kaggle competition.

It’s been quite a ride!.How and where did your data science journey begin?.Sonny Laskar: My Data Science journey started when I was pursuing my MBA from IIM Indore.

Analytics was the go-to area for every aspirant.

One of the early topics of discussions was based on how Target figured out a teen girl was pregnant before her father did.

This made me very curious and I started to deep dive into the world of Data Science.

I had already worked extensively with data but mostly around engineering problems and business intelligence.

No serious machine learning stuff was popular back then with organizations in India.

“I spent two months at the University of Texas, Austin in early 2014 and was surprised by the level of the majority they had with data.

My visit to Dell’s headquarters in Austin and how they used social media data to enhance their product positioning was amazing.

By the end of this, I was completely convinced that I needed to work on this.

”   PD: Your professional career didn’t start off in data science.

The first 6 years or so were spent on data warehousing and infrastructure.

So what kind of challenges did you face when you were getting into data science?.How did you overcome them?.SL: I started my career in 2007 in the world of IT Infrastructure.

In the initial six years, I was primarily working on building massive scale data warehousing applications (processing ~10TB data every).

The focus was more on ETL and BI.

Dashboards and Data marts were the primary output of all these efforts.

This was what we called “Descriptive Analytics”.

By 2014-15, “Predictive Analytics” was already getting a lot of attention and adoption in the US.

It was then that many organizations in India started looking at “Predictive Analytics” with significant focus.

We were already processing Terabytes of data and were very well versed with the engineering side of things.

I was able to understand the fundamentals of Data Science very well since my Mathematics and Statistics concepts are strong and I had a fair exposure to programming.

I started with R since that was the programming language popular in academics and improved my understanding by practicing writing code and replicating other work.

During my MBA, I got a bird’s eye view of many statistical and Data Science approaches.

Since the focus during MBA was more on business, it didn’t allow me to master the technical skills as much as the industry needs.

Post my MBA, I started spending roughly 4-5 hours every day writing code and building on top of it.

I have already written enough code in the past in Bash, Javascript, PHP & Perl.

So, the learning curve was not very steep for me.

I also invested in getting access to cloud subscriptions so that I could play with large volumes of data.

I think it’s worth investing that money when you believe it is going to be helpful in the long term.

Patience, Perseverance & Practice has been my thumb rule for everything in life, which was what I applied here as well.

  Industry Experience versus Data Science Competitions PD: We often hear from hiring managers how aspiring data scientists participate in hackathons and competitions and struggle to bridge the gap during their transition into an industry role.

You have been on both sides of this – you hold rich experience in data science and have excelled in hackathons.

What has been your experience in the industry vs.

hackathon debate?.SL: Data Science is getting a lot of attention from the workforce in the market.

It is in fact very easy to get some training to understand the basic concepts (thanks to MOOCs).

This leads to excessive supply and recruiters then need some ways to filter.

One of the best ways that work is establishing credibility by participating in data science competitions.

Just like most things in life, competitions have their pros & cons.

There is a lot of preparatory work that gets done before a competition is published.

That work is at times extremely complex, time-taking and needs multi-domain understanding.

Similarly, the competition ends with a leaderboard score without any view on what was done with the winners’ solutions.

These are grey areas for many first-timers into Data Science which creates a lot of issues when they join the industry.

I have conducted at least 100 in-person interviews in the last year and I can see this struggle very prominently.

Data Scientists are not expected to just design a machine learning model to predict something.

In many organizations, discussions in meeting rooms end up with a task for the Data Scientist such as “Let us build a model to predict X”.

A good Data Scientist might end up concluding that many such X use cases should not be solved at all with machine learning!.A Data Science team is not expected to be very large in the real world.

They might get involved in many tasks which are either not valuable or can be easily solved without using Machine Learning.

If they feel it can be solved with Machine Learning, then there must be a series of discussions to understand what data would help them address that.

“Unlike competitions, nobody gives you two .

csv files called train and test and a nicely written evaluation metric.

Almost 80% of the efforts go into defining the problem and getting and processing data.

Remaining 20% effort goes into pure modeling and deployment.

” Exposure to competitions helps address a few parts of this: Processing data and feature engineering Building different types of models and getting the best score These are very significant activities and hence recruiters use “competitions” as a good filter to focus on a smaller set of candidates.

To summarize, below are the key issues which competition focused people face when they join the industry: Building a business acumen for understanding how a problem statement helps the business goals and what data drives that Having a problem solver attitude Understanding the software engineering side of production deployment Story-telling: Ability to communicate the results to non-technical folks   Data Science Hackathons and Competitions PD: Ever since data science started becoming mainstream in the last 5 years, multiple competitions keep happening across platforms simultaneously.

How do you pick and choose which data science hackathon or competition you’ll participate in?.SL: I was hooked to data science competitions back in 2016.

I used to participate in as many competitions as I could!.Lately, my personal interest has kind of plateaued as incremental learning has diminished.

Now I participate only if I have time and a very interesting problem.

I also try to participate in offline hackathons along with my Kaggle Grandmaster friend Sudalai Rajkumar (SRK).

I usually participate based on three factors: The novelty of the problem: If the problem statement is something new to me from an existing or new domain which I might not have enough experience in, I would like to play with the data as it helps me build some perception on that problem/domain Data size: I love problems where the data size is extremely large.

I like the kick I get when I run models on machines with 500 GB RAM and 64 Core processors.

It is a lot of fun!.Multiple scheme of approaches: If there are multiple techniques I can experiment with.

In fact, our first Kaggle competition needed us to perform both Text Analytics & Image Analytics and a clear way to merge both   PD: How should a beginner go about participating in these data science hackathons?. More details

Leave a Reply