MBA to IBM Data Scientist: Exclusive Interview with Greg Rafferty

Was it a rough transition?It was actually very smooth.

I started as a mechanical engineer and I’ve been moving towards project roles in managing people and my idea at the time was that I would continue with this company and I would lead some joint ventures abroad.

In this company they had their respective MBAs, so I got my MBA through the company.

They actually supported me on it so I continue to work while I was getting my MBA, which was rewarding but really intense.

Having a full time job while being in school full time.

It was a two year problem, and doing great, had a great time.

And right when I finished the economy was crashing and the opportunities to be a manager in this company were really drying up at least for the next couple of years and so that’s why I chose that time to try something new.

When did data and analytics became part of the professional life?As an engineer I use a lot of data and analytics.

I would build simple regression models but nothing beyond excel.

I started coding while I was in Moscow working with a start up.

And that’s when I built my deep models, not deep learning, but models that could be handled in excel.

I really enjoyed it a lot.

So when I got back to the U.

S.

, my job was very tableau heavy and that was my first introduction to SQL.

At that point, I started looking into more machine learning models, and I was really enjoying it.

I then went to the bootcamp Galvanize in order to push myself up over the edge and really learn data science.

What is Galvanize?Galvanize is a bootcamp.

3 month immersive program, full time, about 8–12 hours a day, 5–6 days a week.

Only 3 months so you can really get it out of the way good and push yourself over the edge and get into data science.

It really helps if you do have the foundational base, because it’s not nearly as intense as a Masters program, for instance.

So you do need to know what you’re getting in to be able to fill in the gaps on your own.

How did you find yourself at IBM?So I’ve been working a lot with the Coursera platform to study data science on my own, and doing couple projects in Galvanize did another really big project, looking at Trump’s Twitter stream.

That sort of gave me a bit of a reputation with NLP (Natural Language Processing).

IBM was looking for NLP Data Scientists.

Through some connections, I was introduced to the hiring manager and it seems like a really great fit right off the bat.

I moved to IBM directly from Galvanize and I spent about eighteen months into my time at IBM here.

I’ve been doing a lot of the A.

I.

work, a lot of NLP work, and some basic client-based analytics work.

I’m a consulting team so I do travel a lot and I work with clients exclusively.

A bunch of work I do is with a client with one of their use cases.

What kind of cool NLP projects do you get to do at IBM?One of the coolest projects I did, we called it “unsupervised annotation” and I actually applied a patent on that.

I’m hoping that comes through.

What that does is it takes a corpus of tens of thousands of documents and it identifies what those documents are about.

It clusters them and it applies annotation to those annotations so you can build a knowledge graph around them.

IBM has a tool called Knowledge Studio and that is it’s a manual annotation process and it takes roughly three weeks to build a model of manually annotating.

That’s very labor intensive, it’s not interesting work, and it takes a a domain expert in order to do it.

So you have to have a like a lawyer to annotate these documents for three weeks which is really not a good use of time.

So what this tool does is, I use Word2Vec, and I use clustering, and then I use some feature extraction tools from IBM Watson API.

Through this pipeline, it creates these annotations.

It takes these documents and it creates this knowledge graph in the Knowledge Studio.

That was the biggest project I worked on.

Introduction to Word Embedding and Word2VecWord embedding is one of the most popular representation of document vocabulary.

It is capable of capturing context of…towardsdatascience.

comIs this an internal software you are building for the IBM, not consultancy work building for other companies?Correct.

This project is an internal one.

We’re fishing around for clients that may want to use it and if we can find one, then we will of course implement into one of our broader Watson products and then make that available to anyone.

But for the time being it’s still in the proof of concept stage.

Is that a normal workflow at IBM (building internal product then attract external clients)?That’s actually a very rare workflow.

Only few teams take things to market using that method.

Most places, the client comes up with the use case and IBM determines a solution to that and then builds a solution directly for that client.

If that use case can be broadened to other clients, every contract is different.

Sometimes the client owns the IP, sometimes IBM does, sometimes there’s some sharing.

But if IBM maintains control of the IP, then we’ll build it for one client but then we’ll sell it around other clients if it’s applicable to other use cases.

What is the timeline for these projects?Some projects are I’ve been on are just one week and some can be several years.

The longest project I’ve done has been about six months.

The project one right now is one where we have two year contract on this.

I don’t know if I’ll be on the project that full time, because there’s a lot of different work streams and depending on your skill set, different consultants hop in and hop off to fill in the gaps.

But I know some consultants have done same project for 12 years.

There’s all sorts of different different types of projects, different arrangements.

It just depends on your skill set.

How is the team formed?My team, the Applied AI team, we do get to work together a lot, but sometimes we will work complete independently on different projects.

Project I’m working on right now I’m actually leading the team.

It was up to me to hire out.

I have three offshore Data Scientists in India and then two onshore that are based locally.

For the offshore ones, I want to hire people who I knew.

I already knew the reputation you the skillset, so I took two people for my team.

What are important traits / skills for executing the task at hand?As a Data Scientist, there’s a broad range of skills and the specific data science skills really depend on the project.

That can be anything from NLP to deep learning to just basic analytics.

But one thing that everyone at IBM has in common is that they’re consultants.

They need to be able to work with the client.

They need to be in meetings with executives and be able to talk intelligently about solutions.

And they need to sell products.

That’s not pushing solutions on to the client.

That understanding the client’s needs and understanding how we can help the client better.

So when I say sell that’s not where we are coming them saying you need to buy this.

But we’re deeply understanding what they need and how we can improve them.

It’s not really aggressively selling.

It’s more passively selling, in that we show them the value and they say they want that.

That’s a skill that is very valuable to a consultant.

So it both soft and hard skills are important.

It’s interesting you brought that up.

Last night I was having a conversation with one of our partners about this.

He was saying that if you are in the top one percent in the technical skillset you’ll be a rockstar IBM.

If you’re in the top one percent of the the soft skills of the client-focusing skills, you also will be a rockstar.

But if you’re below the top one percent then you really need to have both of those skillsets.

So most people do need to have a client-based and a technical skillset.

Switching gears, can you tell us more about the TwitterBot you wrote about on TDS?The idea for that came, it was when Trump just fired James Comey.

He had fired him he tweeted out that it was a pity that he had to fire Mike Flynn because Flynn had lied to the FBI.

Everyone had sort of come out and said well that would be an obstruction of justice.

If you knew he lied to the FBI and you asked Comey not to investigate.

Trump’s rebuttal to that was that, well I didn’t write the tweet, my lawyer did and he sent it.

So what I said I was going to do was analyze his Twitter stream and try to determine who is writing these tweets, was it Trump or was it one of his aides.

The way I could do that was that prior to his presidency, Trump always tweeted from an android device while his staff had always tweeted from iPhones.

I had data of the tweets, and you can see the source.

I used that source as a label and I built a model to determine who was tweeting.

Then after that I built a TwitterBot with which it listened to Trump’s Twitter stream, and whenever he tweeted, it would capture that tweet and sent to the model.

It would then send another tweet saying Trump just tweeted this and I have ninety percent confidence that it was trumped actually wrote it.

After that, I wrote a long post on TowardsDataScience about the features I had built, about the results, and how I built the TwitterBot and took it live.

For more on the TwitterBot, check out Greg’s original post:Who’s Tweeting from the Oval Office?How I built a machine learning model to predict if Trump or if one of his aides has tweeted through his accounttowardsdatascience.

comWhat are your words of wisdom or advice for the TDS community, especially for those who are transitioning from fields like business analytics, management, etc.

?I would say that the most important thing, especially if you are looking for your first job, is to get a Github page and populate it with really interesting projects.

Projects that may not be directly relevant to a company, but you are so excited about you can’t wait to tell people about.

That enthusiasm really comes through in interviews.

That’s one of the key things I look for when I’m interviewing people.

Networking and then blogging about it is also really helpful because it really shows that enthusiasm.

Make sure that really comes through in everything you do.

Thanks again to Greg Rafferty for the interview.

You can find his TDS posts here:Greg Rafferty – Towards Data ScienceRead writing from Greg Rafferty in Towards Data Science.

I'm bolder than barbecue sauce.

Every day, Greg Rafferty and…towardsdatascience.

comHaebichan Jung – MediumRead writing from Haebichan Jung on Medium.

Data Scientist @ Recurly | Editorial Associate @ TDS.

Every day, Haebichan…medium.

comAdditional thanks to Ludovic Benistant and YK Sugi for their project review, guidance, and support.

Youtube.

com.

. More details

Leave a Reply