Why is it important to have a data science portfolio, and what purpose does it serve?

During a recent webinar, data scientist, instructor, and Springboard mentor David Yakobovitch tackled these questions and shared actionable tips that will help anyone looking for a data science job show off their skills the right way.

Portfolios are “extremely critical to have,” David said, “because when you’re in the interview it shows your real-world experience, so you can explain to an employer from A to Z the entire data science workflow.”

Here are highlights from David’s Q&A.

What should be included in a data science portfolio?

If a recruiter or someone who wants to hire you is going to look at your link, it’s important that what you share be able to stand alone without you.

In the competitive job market, it’s important that if you send the link, someone could look at it and understand the business case, they can read the whole problem. So I think the one area that often people can add more to is that business case, writing up those paragraphs.

In data science, you can write that in Markdown. After every single line of code, write a thoughtful interpretation: what’s occurring here what’s the meaning of the analysis, how does it relate to your client or customer. I encourage all my students to do that.

Especially when you work on a project for a couple months, then you revisit it and you’re like, “What was this?” It’s really helpful to document that all because as you’re going through the job search you want this information to be top of mind.

How many projects should be in a data science portfolio?

For the Data Science Career Track, we have two capstones that students work on, so I like to say a minimum of two projects in your portfolio. Often when I work with students and they’ve finished the capstones and they’re starting the job search, I say, “Why not start a third project?” That could be using data sets on popular sites such as Kaggle or using a passion project you’re interested in or partnering with a non-profit.

When you’re doing these interviews, you want to have multiple projects you can talk about. If you’re just talking about one project for a 30- to 60-minute interview, it doesn’t give you enough material. So that’s why it’s great to have two or three, because you could talk about the whole workflow—and ideally, these projects work on different components of data science.

How much time should you spend on a portfolio project?

I’ve had students who’ve accelerated through projects in two to three weeks and some who’ve done really deep dives over a couple of months, so it does depend. The unifying theme on any project is working through the entire data science workflow, so that’s from discovering a problem you’d like to solve, identifying a data set, working through these data files, going through statistics and hypotheses, showcasing visualizations about what the data means, applying the algorithms that might be relevant for that problem, and explaining the metrics and having the business case presentation.

I think this exact structure, which we go through in depth in the program, is applicable to any problem, applicable to any challenge. You could showcase it in the PowerPoint, you could turn it into, you know, an app or a web app if you want to go that extra mile. But following all those steps is great because it helps a recruiter understand your mental models in your methodologies and understand that you’re very structured and thoughtful.

What’s the best order?

The order really depends on what message you’d like to convey and the strength of what you’ve accomplished. I believe that you should show your projects in order of complexity. So if you have three capstones, the one that has the more advanced machine learning or the bigger scale you could show first, and then the others could follow.

If you’re going to show your projects on LinkedIn or on your resume, you don’t need to show many paragraphs. You could put one to two sentences and the link—you know, “This is the supervised machine learning model on credit default risk and here’s the GitHub link.” It could be that simple.

What’s the optimal format?

I think GitHub is the best resource to showcase this. You can set up your own repositories, where it has full support from Markdown, which lets you format your reports. And you can also have all your code—your Jupyter notebooks, your R files—they could all be shared there.

In addition, as part of the portfolio, it’s not only important to show the code and to show the Markdown files, but if you can make a business case for it presentable to an audience, that’s really helpful.

Two things I encourage all students to do: the first is to create a PowerPoint of sorts. You have 10 to 20 slides that summarize it, so anyone can look at that PowerPoint or PDF and see what you’ve accomplished and the business case surrounding that project.

And secondly, going the extra mile is creating a YouTube video of you doing the presentation. When we finish capstones through the Data Science Career Track at Springboard, you’ll be presenting to your mentor and getting live feedback on what you did great. I always encourage students to go the extra mile and then self-record a YouTube video, slightly longer, where you walk through your code and your presentation and then save it. You can keep it as an unlisted video and if a recruiter ever says, “Hey, I want to see an example of your portfolio or a project you’ve done,” not only can you point them to GitHub, but you can also show them your personality and your presentation skills by sharing that YouTube link.

What’s important to remember while presenting your data science portfolio?

I think when you’re presenting a data science portfolio to a recruiter, whether that’s in person or on the phone or video or however, start at the high level. If you did a project on credit defaults for different users at a company, don’t suddenly dive in and tell everything about the project.

Start with the high level: “One of my capstone projects was to predict credit defaults for the state of Minnesota and the results of these efforts saved 5 million dollars for the Minnesota government, or could save that money, resulting in consumers being better with their spending. I’d love to talk in depth about this; are there certain parts of a data science workflow that would be interesting for you for this interview? Would you like me to walk you through all the steps?”

For more from David, including tips on developing a solid business case, resume suggestions, recommendations for mastering GitHub, and thoughts on starting out in data science, watch the full webinar:

Interested in starting or growing your career with the support of a mentor like David? Check out our Data Science Career Track—you’ll learn the skills and get the personalized guidance you need to land the job you want.