Six years after the Harvard Business Review dubbed it “the sexiest job of the 21st century,” data scientist remains one of the most sought-after roles in business. These often prove to be tricky job openings to fill. It takes an average of five days longer to close data scientist and data analyst roles than the market average, causing employers to pay premium salaries for qualified professionals. That’s partly why Glassdoor named data scientist the best job in America for the third consecutive year in 2018. The data scientist job outlook is strong. If you’re one of the growing number of professionals interested in switching to this exciting field, read on to learn how to become a data scientist.
This is an excerpt of Springboard’s free guide to data science jobs, which has been updated for 2019.
What Is Data Science?
DJ Patil, who built the first data science team at LinkedIn before becoming the first chief data scientist of the United States in 2015, coined the modern version of the term “data scientist” with Jeff Hammerbacher (Facebook’s early data science lead) in 2008.
Patil has put it this way:
“A data scientist is that unique blend of skills that can both unlock the insights of data and tell a fantastic story via the data.”
A decade after it was first used, the term remains contested. There is some debate among practitioners and academics about what “data science” means and whether it’s different from the data analytics and statistics that companies have long prioritized.
One of the most substantive differences, however, is the amount of data processed now as opposed to a decade ago. In 2020, the world will generate 50 times more data than in 2011—on average, Google now processes more than 70,000 searches every second.
With that in mind, data science can be considered an interdisciplinary solution to the explosion of data that takes old data analytics approaches and uses machines to augment and scale their effects on larger data sets.
What Is a Data Scientist?
So, what does a typical data scientist look like? Patil posits that “the dominant trait among data scientists is an intense curiosity—a desire to go beneath the surface of a problem, find the questions at its heart, and distill them into a very clear set of hypotheses that can be tested.”
Notice that there is no mention here of a strict definition of data science, nor of a profile that must fit it.
Think about this: Baseball players used to be judged by how good scouts thought they looked, not how many times they got on base—that was until 2002, when the Oakland A’s won an all-time league record 20 games in a row with one of the lowest-paid rosters in the league. And elections used to swing from party to party with little semblance of predictive accuracy—that was until Nate Silver correctly predicted every electoral vote in the 2012 elections. (2016 was… a little more complicated.)
Data, and a systematic approach to uncover truths about the world around us, have changed the world.
“More than anything, what data scientists do is make discoveries while swimming in data. It’s their preferred method of navigating the world around them,” concludes Patil.
To do data science, you have to be able to find and process large data sets. You’ll often need to understand and use programming, math, and technical communication skills.
Most importantly, when it comes to data science qualifications, you need to have a sense of intellectual curiosity to understand the world through data, and not be deterred easily by obstacles.
Related: The Value of a Data Scientist
How to Become a Data Scientist
Source: Stack Exchange
Becoming a data scientist is difficult. It should come as no surprise that, traditionally, data scientist requirements include advanced education. Research suggests that most data scientists are equipped with an advanced degree in mathematics and statistics (32 percent), computer science (19 percent), or engineering (16 percent). However, because demand far outpaces supply, companies often hire individuals without a graduate degree (and sometimes without any degree).
Glassdoor recommends the following data scientist qualifications:
- Master’s or Ph.D. in statistics, mathematics, or computer science
- Experience using statistical computer languages such as R, Python, SQL, etc.
- Experience in statistical and data mining techniques, including generalized linear model/regression, random forest, boosting, trees, text mining, social network analysis
- Knowledge of machine learning techniques such as clustering, decision tree learning, and artificial neural networks
- Knowledge of advanced statistical techniques and concepts, including regression, properties of distributions, and statistical tests
- Experience using web services: Redshift, S3, Spark, DigitalOcean, etc.
- Experience analyzing data from third-party providers, including Google Analytics, Site Catalyst, Coremetrics, AdWords, Crimson Hexagon, Facebook Insights, etc.
- Experience with distributed data/computing tools: Map/Reduce, Hadoop, Hive, Spark, Gurobi, MySQL, etc.
- Experience visualizing/presenting data for stakeholders using: Periscope, Business Objects, D3, ggplot, etc.
In addition to understanding data, a data scientist must be comfortable presenting their findings to company stakeholders. Finding someone skilled in mathematics and coding who is also adept at presenting and explaining their discoveries in layman’s terms isn’t an easy task, which is why “data scientist” is such a lucrative position.
Data Science Skills Needed
You’ll need an overall analytical mindset to do well in data science. A lot of data science involves solving problems. You’ll have to be adept at framing those problems and methodically applying logic to solve them.
When data gets large, it often gets unwieldy. You’ll need to use mathematics to process and structure the data you’re dealing with. Exactly how much and what kind depends on the specifics of your role. But it’s safe to say the typical data scientist will have familiarity with statistics, linear algebra, and calculus.
You need to know statistics to play with data. Statistics allows you to slice and dice through data, extracting the insights you need to make reasonable conclusions. You must know statistics to infer insights from smaller data sets onto larger populations. This is the fundamental law of data science.
The process of turning numbers into insights is what it’s all about. In the business world, a data analyst will focus on exploring large sets of data and connecting that data with actions that can drive business impact.
Finishing your data analysis is only half the battle. To drive impact, you will have to convince others to believe and adopt your insights. Human beings are visual creatures. It’s typically much easier for us to process information by examining a (thoughtfully created) chart or graph than by poring over a spreadsheet.
Put simply, an algorithm is a well-defined set of steps to solve a specific problem. Data scientists use algorithms to make computers follow a certain set of rules or patterns. Understanding how to use machines to do your work is essential to processing and analyzing data sets too large for the human mind to process.
Machine learning is the set of algorithms used to make predictions based on a set of known information. Machine learning is what allows Amazon to recommend products based on your purchase history without any direct human intervention. It is a group of algorithms that will use machine power to unearth insights for you. In order to deal with massive data sets, you’ll need to use machines to extend your thinking.
Deep learning typically refers to the set of machine learning algorithms that extends a basic neural network to much higher levels of complexity, making them capable of learning on much larger data sets and performing many more operations than standard models. The data usually gets this large in image processing and signal processing.
Natural Language Processing
Natural language processing (NLP) uses techniques from computer science, linguistics, and machine learning to process human language, typically in the form of unstructured text.
Common applications of NLP include: text classification (e.g., is this news article fake or real?), sentiment analysis (e.g., how much do customers like my product?) and topic modeling (e.g., what are some common themes people are talking about?).
Data means little without its context. Most companies depend on their data scientists not just to mine data sets, but also to communicate their results to various stakeholders and present recommendations that can be acted upon.
Communication is an underrated skill that can make or break a project.
The best data scientists not only have the ability to work with large, complex data sets, but also understand intricacies of the business or organization they work for.
Having general business knowledge allows them to ask the right questions, and come up with insightful solutions and recommendations that are actually feasible given any constraints that the business might impose.
As a data scientist, you should have deep knowledge of the company you work for and also understand the larger industry within which it operates for your insights to make sense. Data from a biology study can have a drastically different context than data gleaned from a well-designed psychology study. You should know enough to cut through industry jargon.
Roles Within Data Science
While some small companies can call upon a jack-of-all-trades data scientist, more often a data science team will rely on different team members for accomplishing different tasks. Let’s look at some broad categories of roles that often get lumped under the umbrella term “data science.”
One definition of a data scientist is someone who knows more programming than a statistician, and more statistics than a software engineer. Data scientists fine-tune the statistical and mathematical models that are applied to that data. This could involve applying theoretical knowledge of statistics and algorithms to find the best way to solve a data problem. For instance, a data scientist might use historical data to build a model that predicts the number of credit card defaults in the following month.
A data scientist will be able to run data science projects from beginning to end. They can identify a business problem, store and clean large amounts of data, explore data sets to identify insights, build predictive models, and weave a story around the findings.
Within the broad category of data scientists, you might encounter statisticians who emphasize statistical approaches to data and data managers who focus on running data science teams.
Data scientists are the bridge between programming and implementation of data science, the theory of data science, and the business implications of data.
Their average base pay is just below $140,000 USD, according to Glassdoor.
Data engineers are software engineers who handle large amounts of data, and often lay the groundwork and plumbing for data scientists to do their jobs effectively. They are responsible for managing database systems, scaling the data architecture to multiple servers, and writing complex queries to sift through the data. They might also clean up data sets and implement complex requests that come from data scientists, e.g., they take the predictive model from the data scientist and implement it into production-ready code.
Data engineers, in addition to knowing a breadth of programming languages (e.g., Ruby or Python), will usually know some Hadoop-based technologies (e.g., MapReduce, Hive, and Pig) and database technologies like MySQL, Cassandra, and MongoDB.
Within the broad category of data engineers, you’ll find data architects who focus on structuring the technology that manages data models and database administrators who focus on managing data storage solutions.
Their average base pay is higher than data scientists’, based on Glassdoor data: $151,000 USD.
Data / Business Analysts
Data analysts sift through data and provide reports and visualizations to explain what insights the data is hiding. When somebody helps people from across the company understand specific queries with charts, they are filling the data analyst (or business analyst) role. In some ways, you can think of them as junior data scientists, or the first step on the way to a traditional data science job.
Business analysts are adjacent to data analysts, and are more concerned with the business implications of the data and the actions that should result. Should the company invest more in project X or project Y? Business analysts will leverage the work of data science teams to communicate an answer.
Their average base pay is around $84,000 USD, according to Glassdoor, partly because many roles are filled by entry-level graduates with limited work experience.
Machine Learning Engineer
Machine learning engineers are highly sought after and command an annual median salary of $115,000, according to Glassdoor (note that this is a much narrower job function than the previous titles). They’re mostly responsible for building, deploying, and managing machine learning projects.
Most machine learning roles will require the use of Python or C/C++ (though Python is often preferred). Background in the theory behind machine learning algorithms and an understanding of how they can be efficiently implemented in terms of both space and time is critical.
The easiest path to a career as a machine learning engineer, though by no means the only one, is to start off with a software engineering background and then gain the statistics and machine learning knowledge needed to take on the role. Some also begin as academics more involved with machine learning theory who then develop their software engineering skills.
How to Get a Job in Data Science
Build Your Portfolio
No matter the field, it’s important to make a great first impression. But this is particularly important when you’re transitioning into a new career. It all starts with your portfolio and your resume.
Many data scientists have their own websites, which serve as both a repository of their work and a blog. This allows them to demonstrate their experience and the value they create in the data science community.
In order for your portfolio to have the same effect, it must share the following traits:
- Your portfolio should highlight your best projects. Focusing on a few memorable projects is generally better than showing a large number of them.
- It must be well-designed and tell a captivating story of who you are beyond your work.
- You should build value for your visitors by highlighting any impact you’ve had through your work. Maybe you built a tool that’s useful for a general audience? Perhaps you have a tutorial? Showcase them here.
- It should be easy to find your contact information.
Find a Mentor
One of the highest-value networking activities you can pursue is finding a mentor who can guide you as you pursue a data science career. Somebody who has been in a hiring position can tell you exactly what companies are looking for and how to prepare for interviews. She can also introduce you to other people in the data science community, or in the best of cases, even end up hiring you!
What some people don’t understand is that mentorship is a two-way street, and you can always create value for your mentor in different ways, whether it’s sharing your story, or giving them some perspective on problems they see. Mentorship is a special category of a relationship where you can build value for yourself in a professional context—but never forget the golden rule of relationships: you get what you give.
Go to Conferences
At some of these events, you will get to hear from and build connections with established data scientists, and even unearth hidden job opportunities. With a bit of searching, you can find great data science events in your area.
Here are a few to consider:
The Strata Data Conference, created in 2012, is the largest data conference series in the world. Speakers come from academia and private industry. The themes tend to be oriented around cutting-edge data science trends in action. Practical workshops are provided if you want to learn the technology behind data science, and there are plenty of networking events.
KDD (Knowledge Discovery in Data Mining) is a large interdisciplinary conference bringing together researchers and practitioners from data science, data mining, knowledge discovery, large-scale data analytics, and big data. It’s also an organization that seeks to lead discussion and teaching of the science behind data science. Membership and attendance at these conferences offer an awesome way to contribute to growing trends in data science.
NeurIPS, or Neural Information Processing Systems (previously known as NIPS), is a largely academic data science conference focused on evaluating cutting-edge science papers in the field. Attending will give you a sneak preview of what will shape data science in the future.
The International Conference on Machine Learning (ICML) is supported by the International Machine Learning Society and brings together some of the best minds in machine learning to present research and discuss new ideas. It was first held in 1980.
We’ve listed the major conferences where the data science community assembles, but there are many smaller meetups that serve to connect the local data science community, including newcomers trying to figure out how to become a data scientist.
The San Francisco Bay Area tends to have the most data meetups, though there is usually one in every major city in the U.S. You can look up data science meetups near you with Meetup.com. Some of the largest data science meetups, with more than 4,000 members, are SF Data Mining, Data Science DC, Data Science London, and the Bay Area R User Group.
Most data science meetups are organized by influencers in the local data science community. If you really want to make a splash, you should consider volunteering at a data science event.
Most events follow the same format, with an invited speaker who gives a talk and then a networking period where everybody connects with each other. The general data science meetups will often have an industry talk where somebody will delve into a real-world data science problem and how it was solved. Specialized data science meetups, such as Python or R groups, will often focus on technical tutorials that teach a specific tool or skill.
Other Ways to Network
We live in a digital world, so you shouldn’t feel confined to offline networking! Some of the best data scientists are on Twitter, and you can discover influencers worth following (and potentially connecting with) through podcasts and other outlets.
Talking Machines includes interviews with prominent data scientists. Partially Derivative has been described as “‘Car Talk’ for the data community.” The O’Reilly Data Show is the equivalent of a graduate seminar delivered in podcast form.
You’ll also find online blogs, newsletters, and communities such as O’Reilly and KDnuggets that will help you connect with data scientists online.
- Kaggle offers a job board for data scientists.
- You can find a list of open data scientist jobs at Indeed and Glassdoor.
- Datajobs is a listings site for data science.
- Data Science Central has a frequently updated jobs section.
As previously mentioned, you can also find job opportunities through networking and through finding a mentor. We continue to emphasize that the best job positions are often found by talking to people within the data science community.
You’ll also be able to find opportunities for employment in startup forums. Hacker News has a job board that is exclusive to Y Combinator startups. Y Combinator is arguably the most prestigious startup accelerator in the world. AngelList is a database for startups looking to get funding and it also has a robust jobs section.
Ace the Data Science Interview
An entire book could be written on the data science interview—in fact, we have one of those! But here’s a condensed guide to help prepare you to nail the various parts of the interview process.
The Phone Screen: Your point of entry typically will be the human resources department. Sometimes there will be basic technical questions to screen out unqualified candidates, but most of the time, this screen involves establishing the beginnings of a culture fit and making sure that you have the communication skills to come off well in a subsequent interview with the hiring manager.
The Take-Home Assignment: After the phone screen, companies often send a prepared assignment to candidates, with some time pressure applied to screen out people who may be technically weak or who may not be committed to the recruitment process. Common example assignments include: a deep analysis on a specific data set provided for you, cleaning a data set with significant errors, or working with a specific problem relevant to the business (e.g., building a job recommendation system for applicants based on data from job descriptions).
The Call With the Hiring Manager: This will likely be the final evaluation before a company invites you to an on-site interview. The call typically is split into three components: mathematics/statistics, coding, communication/culture.
On-Site Interview – With the Hiring Manager: Finally meeting face to face, the hiring manager will be evaluating you from both a technical and nontechnical perspective. They’re looking to ascertain if you’re a culture fit, and they may test you on your technical chops by having you whiteboard different scenarios. In general, hiring managers appreciate when you demonstrate: passion for the company and data science in general, an ability to get along well with everybody, strong willingness to learn and demonstrated ability to do so rapidly, a strong record of previous projects and the ability to relate them with impact driven, and strong analytical ability.
On-Site Interview – Technical Challenge: Prepare to be challenged on your technical skills in one form or another, especially for roles that lean more toward data engineering. You’ll often find that this is similar to a software engineering interview, where you will be asked to whiteboard and write down how you’d implement certain algorithms or solve problems.
On-Site Interview – With an Executive: If you pass the bar for your hiring manager, you’ll likely have a final interview with a senior executive. In a startup, this will often be the founder. Normally, only candidates who have passed the technical assessment will get here, so now you need to emphasize how you can drive impact with your knowledge of the business itself and the problems it faces.
Resources to Get You Started
Data Science Roles
This article from Hacker Noon, written by Google’s chief decision intelligence engineer, runs through the different data science roles from a unique perspective.
This Springboard blog post also provides a deep dive into the different career paths.
This infographic on data science salaries has updated information on the largest U.S. markets.
Skills and Tools
This Quora post is a broad overview of many of the essential skills you need to become a data scientist, and resources to go about learning them.
Our mini data science dictionary has 30 common data terms explained in basic English.
The following introduction to Python will get you set up on the basics.
This blog will help you with all of the latest news in Excel data visualization.
This interactive tutorial to R will help you grasp the fundamentals.
W3Schools has an excellent interactive tutorial on SQL that will get you started on how to select parts of a database for further analysis.
This Springboard blog post lists many of your options for finding high-quality public data sets.
The 10 most influential data mining algorithms can be quite complex (and there are many others), but this blog post explains them in plain English.
This repository on machine learning offers a solid definition and working examples you can get started on right away. If you’re more of a visual learner, this introduction to ML concepts will fill the gap for you.
Flowing Data is a blog that focuses on data communication and the design of appealing data visualizations.
Data Science Interviews
Here is a list of data science interview questions and how to prepare for them.
Building a Data Science Portfolio
Check out this webinar for data science portfolio guidance from a Springboard mentor.
Hopefully, this post was helpful to those wondering how to become a data scientist. If you want more guidance, consider Springboard’s Data Science Career Track, a self-paced, mentor-led bootcamp with a job guarantee. It comes with weekly one-on-one mentorship from your own data science expert and consistent career coaching to help you find your dream job. Find out more!