How to Build a Data Science Portfolio

Focus your resume on independent projects, like capstone projects, independent research, thesis work, or Kaggle competitions.

These are substitutes for work experience if you don’t have work experience to put on your resume.

Avoid putting irrelevant experience on your resume.

If you want to know hear data science managers go over portfolios and resumes, here are links to Kaggle’s CareerCon 2018 (video, resumes reviewed).

Importance of Social MediaThis is very similar to the Importance of a Portfolio section, just divided into subsections.

Having a Github page, a Kaggle profile, a Stack Overflow, etc can provide support for your resume.

Having online profiles filled out can be a good signal for hiring managers.

As David Robinson phrases it,Generally, when I’m evaluating a candidate, I’m excited to see what they’ve shared publicly, even if it’s not polished or finished.

And sharing anything is almost always better than sharing nothing.

The reason why data scientists like seeing public work is as Will Stanton said,Data scientists use these tools to share their own work and find answers to questions.

If you use these tools, then you are signaling to data scientists that you are one of them, even if you haven’t ever worked as a data scientist.

A lot of Data science is about communication and presenting data so it is good to have these online profiles.

Besides from the fact that these platforms help provide valuable experience, they can also help you get noticed and lead people to your resume.

People can and do find your resume online through various sources (LinkedIn, GitHub, Twitter, Kaggle, Medium, Stack Overflow, Tableau Public, Quora, Youtube, etc).

You will even find that different types of social media feed into eachother.

GithubGithub profiles of Jennifer Bryan and Yuan (Terry) TangA Github profile is a powerful signal that you are a competent data scientist.

In the projects section of a resume, people often leave links to their GitHub where the code is stored for their projects.

You can also have writeups and markdown there.

GitHub lets people see what you have built and how you have built it.

At some companies, hiring managers look at an applicants GitHub.

It is another way to show employers you aren’t a false positive.

If you take the time to develop your GitHub profile, you can be better evaluated than others.

It is worth mentioning that you need to have some sort of README.

md with a description of your project as a lot of data science is about communicating results.

Make sure the README.

md file clearly describes what your project is, what it does, and how to run your code.

KaggleParticipating in Kaggle competitions, creating a kernel, and contributing to discussions are ways to show some competency as a data scientist.

It is important to emphasize that Kaggle is not like an industry project as Colleen Farrelly, mentions in this quora question.

Kaggle competitions take care of coming up with a task, acquire data for you, and clean it into some usable form.

What it does is give you practice analyzing data and coming up with a model.

Reshama Shaikh has a post To Kaggle Or Not where she talked about the value of Kaggle competitions.

From her post,It is true, doing one Kaggle competition does not qualify someone to be a data scientist.

Neither does taking one class or attending one conference tutorial or analyzing one dataset or reading one book in data science.

Working on competition(s) adds to your experience and augments your portfolio.

It is a complement to your other projects, not the sole litmus test of one’s data science skillset.

It is also true that there is a good reason why Kaggle Grandmasters continue to participate in Kaggle competitions.

LinkedinUnlike a resume, which is confined by length, a LinkedIn profile allows you to describe your projects and work experience in more depth.

Udacity has a guide on making a good LinkedIn profile.

An important part of LinkedIn is their search tool and for you to show up, you must have relevant keywords in your profile.

Recruiters often search for people on LinkedIn.

LinkedIn allows you to see which companies have searched for you and who has viewed your profile.

Checking where your searchers work and how many times people have viewed your profile.

Besides companies finding you and sending you messages on your availability, LinkedIn also has many features like Ask for a Referral.

Jason Goodman in his article Advice on Applying to Data Science Jobs uses LinkedIn to indirectly ask for referrals.

I never, never, never applied to any companies without an introduction to someone who worked at the company…once I was interested in a company, I would use LinkedIn to find a first- or second- degree connection at the company.

I would write to that connection, asking to talk to them about their experience at the company and, if possible, whether they’d be able to connect me to someone on the Data Science team.

Whenever I could, I did in-person meetings (coffee or lunch) instead of phone calls.

As an aside, Trey Causey recently wrote a great post on how to ask for just these kinds of meetings.

I would never ask for a job directly, but they would usually ask for my resume and offer to submit me as an internal referral, or put me in touch with a hiring manager.

If they didn’t seem comfortable doing so.

I’d just thank them for their time and move on.

Notice that he doesn’t right away ask for a referral.

While common job advice when applying to a company is to get a referral, it is VERY IMPORTANT to note that you still need a portfolio, experience, or some sort of proof you can do a job.

Jason even mentions the importance of a portfolio in that and other articles he has written.

Aman Dalmia learned something similar by Interviewing at Multiple AI Companies and Startups.

Networking is NOT messaging people to place a referral for you.

When I was starting off, I did this mistake way too often until I stumbled upon this excellent article by Mark Meloon, where he talks about the importance of building a real connection with people by offering our help first.

One other point he had is that LinkedIn is great for getting your content/portfolio out.

Another important step in networking is to get your content out.

For example, if you’re good at something, blog about it and share that blog on Facebook and LinkedIn.

Not only does this help others, it helps you as well.

Medium and/or Other Blogging PlatformsHaving some form of blog can be highly beneficial.

A lot of data science is about communication and presenting data.

Blogging is a way of practicing this and showing you can do this.

Writing about a project or a data science topic allows you to share with the community as well as encourages you to write out your work process and thoughts.

This is a useful skill when interviewing.

As David Robinson said,A blog is your chance to practice the relevant skills.

Data cleaning: One of the benefits of working with a variety of datasets is that you learn to take data “as it comes”, whether it’s in the form of a supplementary file from a journal article or a movie scriptStatistics: Working with unfamiliar data lets you put statistical methods into practice, and writing posts that communicate and teach concepts helps build your own understandingMachine learning: There’s a big difference between having used a predictive algorithm once and having used it on a variety of problems, while understanding why you’d choose one over anotherVisualization: Having an audience for your graphs encourages you to start polishing them and building your personal styleCommunication: You gain experience writing and get practice structuring a data-driven argument.

This is probably the most relevant skill that blogging develops since it’s hard to practice elsewhere, and it’s an essential part of any data science careerBy writing a blog, you can practice communicate findings to others.

It also is another form of advertising yourself.

Blogs about Using Scrapy to Build your Own Dataset, and ironically Python Environment Management with Conda have taught me a lot and have gotten me a lot of opportunities I would normally not have gotten.

One of the major benefits I have found is that throughout the process of people critiquing my projects and suggesting improvements (though the comments section of the blog) makes it so interviewers aren’t the first ones pointing out these same flaws.

The more obvious benefit is that by making a blog you tend to read a lot more data science/machine learning blog posts and hence learn more.

As for what platform to blog on, I recommend using Medium.

Manali Shinde in her blog post How to Construct a Data Science Portfolio from Scratch had a really good point on why she choose Medium for her blog.

I thought of creating my own website on a platform such as WordPress or Squarespace.

While those platforms are amazing to host your own portfolio, I wanted a place where I would get some visibility, and a pretty good tagging system to reach greater audiences.

Luckily Medium, as we know, has those options (and it’s also free).

If you don’t know what to write about, I suggest you look at David Robinson’s advice.


com/drobTwitterBeing active on Twitter is a great way to identify and interact with people in your field.

You can also promote your blog on Twitter so that your portfolio can be that much more visible.

There are so many opportunities to interact with people on twitter.

One of them as Reshama Shaikh said in her famous blog post “How Do I Get My First Data Science Job?” was,David Robinson generously offers to retweet your first data science post.

With 20K+ followers, that’s an offer that can’t be refused.

Twitter can be used for other things than self promotion.

Data Science Renee has a post “How to use Twitter to Learn Data Science (or Anything)” that is quite insightful about taking Twitter to learn skills.

One other takeaway from her article was how much her Twitter presence helped her network and get opportunities.

I have been asked to be interviewed on podcasts and blogs (some of those should be coming up soon), offered contract work, and offered free admission to a conference I unfortunately couldn’t go to, but was excited to be considered for.

“Famous” people in the industry are now coming to me to work with them in some way.

Tableau PublicNot every data science job uses Tableau or other BI tools.

However, if you are applying to jobs where these tools are used, it is important to note that there are websites where you can put dashboards for public consumption.

For example, if you say you are learning or know Tableau, put a couple dashboards on Tableau Public.

While a lot of companies might be okay with you learning Tableau on the job, having public evidence of your Tableau skill can help.

If you want to see good examples of Tableau Public profiles, please see Orysya Stus’ and Brit Cava’s profiles.

ConclusionRemember a portfolio is a process.

Keep on improving.

Having a strong resume has long been the primary tool for job seekers to relay their skills to potential employers.

These days, there is more than one way to showoff your skills and get a job.

A portfolio of public evidence is a way to get opportunities that you normally wouldn’t get.

It is important to emphasize that a portfolio is an iterative process.

As your knowledge grows, your portfolio should be updated over time.

Never stop learning or growing.

Even this blog post will be updated with feedback and with increasing knowledge.

If you want interview advice/guides, time to check out Brandon Rohrer’s advice on how to survive a data science interview, Sadat’s interview guide, or Springboard’s advice.

If you have any questions or thoughts on the tutorial, feel free to reach out in the comments below or through Twitter.


. More details

Leave a Reply