This is an excerpt of our free comprehensive guide to data science jobs.
At the beginning of 2016, Glassdoor, one of the top careers websites in the world, released a report with the best jobs to pursue. Each job is ranked based on a composite score of median reported salary, job openings, and career opportunities. And at the top of this list, the top job to pursue in 2016 was a relatively new profession called Data Scientist.
Data Science Definition
Such is the pace at which data is proliferating the world, that a phrase that barely existed a decade ago, is one of the most sought-after professions. It’s been hard to nail down an exact data science definition. We’ve certainly tried.
This new world economy needs a new approach to skills education. At Springboard, we’re building an educational experience that empowers our students to thrive in this new digital world. Through our online workshops, we have prepared thousands of people for careers in data science, with 1-on-1 mentorship from industry experts.
As part of our mission to make high quality education accessible for all and to help people advance their careers, we’ve crafted his blog post. Our goal is to bring you insight from our network of industry experts and demystify data science careers. Maybe we’ll even inspire some of you to pursue a career in this fascinating field.
GiveDirectly is a non-for-profit that shouldn’t work. The organization has built its success on giving unconditional cash transfers to the poorest people in the world. Charities aren’t supposed to give their recipients unlimited leeway: they’re supposed to only provide certain goods for certain needs.
GiveDirectly is designed to break all the rules — and it’s working.
The organization’s mandate is to transform international giving by attacking extreme poverty at its roots. People who are helped by GiveDirectly decide how to help themselves. This has led to one of the lowest percentages of money spent on administration, and stunning results. Recipients are well on their way to doubling their assets. Their rate of hunger is almost halved. They earn 34% more.
It’s hard to overstate how difficult GiveDirectly’s mission is. The regions they work in are often neglected and forgotten. They not only have to provide for the very poorest, they have to find them.
Since census data is sparse or unreliable at a village level, GiveDirectly would often have to send somebody to manually scour each village for signs of obvious poverty.
One of the signs GiveDirectly representatives look for is the presence of metal on home roofing rather than the more plentiful thatch. People who can afford metal roofs typically buy them. At a cost of around $564 USD in a region where GDP per capita is around $1,700, they represent a significant capital investment, and a good sign of the difference between extreme and relative poverty.
But sending people to each village could take several trips at a crushing expense, creating overheads for an organization looking to operate leanly.
Data science to the rescue
Liaising with GiveDirectly, a pair of industry experts from IBM and Enigma set out to see if data science could help.
Using satellite images provided by Google, they were able to use computers to classify which villages had metal roofs on top of their houses, and which ones had thatch. They were able to determine which villages needed the most help without sending a single person to the area.
This required mining satellite data and making sense of massive amounts of data, something that would have been impossible a decade ago. It required implementing machine learning algorithms, a cutting-edge technology at the time, to train computers to recognize patterns.
These data scientists were able to pinpoint where GiveDirectly should operate, saving the organization hundreds of man-hours and allowing it to do what it does best: solving extreme poverty.
GiveDirectly is just one example of how organizations win by using data to their advantage. They’re a great look at what a data science definition looks like — in action.
Around the world, organizations are creating more data every day, yet most are struggling to benefit from it. According to McKinsey, the US alone will face a shortage of 150,000+ data analysts and an additional 1.5 million data-savvy managers.
According to LinkedIn, Statistical Analysis & Data Mining were the hottest skills that got recruiters’ attention in 2014. Glassdoor ranked Data Scientist as the #1 job to pursue in 2016. HBR even called it the sexiest career of the 21st century.
GiveDirectly was able to save thousands of dollars and put their money where their mission is thanks to a team of three data scientists. Within the mass of data the world generates every day, similar insights are hidden away. Each may have the potential to transform entire industries, or to improve millions of lives.
Salary trends have followed the impact data science drives. With a national average salary of $118k (which increases to $126k in Silicon Valley), a data science definition comes clearer into focus now: a lucrative career path where you can solve hard problems and drive social impact.
Since you’re reading this post, you’re likely curious about a career in data science, and you’ve probably heard some of these facts and figures. You want to know what a data science definition means for you. You likely know that data science is a career where you can do good while doing well. You’re ready to dig beyond the surface, and see real-life examples of data science, and get real-life advice from practitioners in the field.
That’s exactly why we wrote this post. To bring a data science definition to life, for thousands of data-curious, savvy young professionals. We hope that after reading this post, you have a solid understanding of a data science definition, and know what it takes to navigate your first data science job. We also want to leave you with a checklist of actionable advice which will help you throughout your data science career.
What is data science?
DJ Patil, the current Chief Data Scientist of the United States and previously the Head of Data Products at Linkedin, is the one who first coined the term data science.
A decade after it was first used, the term remains contested. There is a lot of debate among practitioners and academics about what data science means, and whether it’s different at all from the data analytics that companies have always done.
One of the most substantive differences is the amount of data you have to process now as opposed to a decade ago. In 2020, the world will generate 50x more data than we generated in 2011. Data science can be considered an interdisciplinary solution to the explosion of data that takes old data analytics approaches, and uses machines to augment and scale their effects on larger datasets.
DJ posits that, “the dominant trait among data scientists is an intense curiosity—a desire to go beneath the surface of a problem, find the questions at its heart, and distill them into a very clear set of hypotheses that can be tested.” There is no mention here of a strict definition of data science, nor of a profile that must fit it.
“The dominant trait among data scientists is an intense curiosity—a desire to go beneath the surface of a problem, find the questions at its heart, and distill them into a very clear set of hypotheses that can be tested.” – DJ Patil, Chief Data Scientist of the United States
Baseball players used to be judged by how good scouts thought they looked, not how many times they got on base – that was until the Oakland A’s won an all-time league record 20 games in a row with one of the lowest paid rosters in the league. Elections used to swing from party to party with little semblance of predictive accuracy–that was until Nate Silver correctly predicted every electoral vote in the 2012 elections.
Data and a systematic approach to uncover truths about the world around us have changed the world.
“More than anything, what data scientists do is make discoveries while swimming in data. It’s their preferred method of navigating the world around them,” concludes Patil.
To do data science, you have to be able to find and process large datasets. You’ll often need to understand and use programming, math, and technical communication skills. You’ll need to be a unicorn that can put together a lot of different skillsets.
Most importantly, you need to have a sense of intellectual curiosity to understand the world through data, and not be deterred easily by obstacles.
You might not think you know anything about data science, but if you’ve ever looked for a Wikipedia table to settle a debate with one of your friends, you were doing a little bit of data science.