Back to Blog

data science in healthcare
Data Science

Data Science in Healthcare: How It Improves Care

7 minute read | March 12, 2019
Leah Davidson

Written by:
Leah Davidson

Ready to launch your career?

The U.S. healthcare industry is ripe for disruption. A McKinsey report shows that healthcare costs now represent almost 18 percent of GDP—a whopping $600 billion. And a Ponemon Institute survey revealed that healthcare fields store 30 percent of global data.

With primary sources, electronic medical records (EMRs), clinical trials, genetic information, billing, wearable data, care management databases, scientific articles, social media, and internet research, the healthcare industry has no shortage of data available. Since 72 percent of people look up health information online and more patients use tools like Zocdoc to communicate with medical professionals and book appointments, it’s easier than ever before to manage customer data in one centralized location.

“Quantified health” is a relatively new movement that integrates data directly from consumer wearables (pedometers, Fitbits, Muse headbands, etc.), blood pressure cuffs, glucometers, and scales into EMRs through smartphones (Apple’s HealthKit, Google Fit, and Samsung Health are a few examples), and can pick up on warning signs faster by tracking changes in behavior and vital signs.

According to a LinkedIn’s U.S. Emerging Jobs report, the data science field has grown by 350 percent since 2012 and only 35,000 candidates have the necessary skills to fill job openings. Data science can either be used for analysis (pattern identification, hypothesis testing, risk assessment) or prediction (machine learning models that predict the likelihood of an event occurring in the future, based on known variables).

With only 3 percent of U.S.-based data scientists working in the healthcare/hospital industry, the need for more trained data experts is growing quickly. Like any industry, healthcare workers should be familiar with statistics, machine learning, and data visualization

Here are some use cases showing how data science is revolutionizing healthcare.

Drug Discovery

It costs up to $2.6 billion and takes 12 years to bring a drug to market. Big data allows scientists to simulate the reaction of a drug with body proteins and different types of cells and conditions so that it has a much higher likelihood of gaining Food and Drug Administration approval and curing diverse patients (e.g., people with certain mutation profiles).

Mark Ramsey, chief data officer at GSK, shared how large pharmaceutical companies are using clinical trial data and partnerships with biobanks to expedite the drug discovery process. Ramsey said, “We’re really pushing to see how far we can advance use of AI and computer simulation in the drug discovery process with the goal being to take the process to maybe less than two years.”

He went on: “That’s one of the benefits of GSK being a large pharmaceutical company because we have hundreds and hundreds and thousands of clinical trials… If you look at the clinical trial data one of the things that’s extremely important is to make sure the diversity of our clinical trials match the population diversity. We can better understand how to design the trial to be effective and efficient and also match the diversity.”

Startups are also raising significant amounts of venture capital to expedite the drug discovery and testing process. BenevolentAI is a unicorn based in London that has raised $115 million to start over 20 drug programs and create “a bioscience machine brain, purpose-built to discover new medicines and cures for disease.” Its first clinical trial this year in Europe and the U.S. will address excessive daytime sleepiness in Parkinson’s disease.

Disease Prevention

The best way to transform healthcare is to recognize risks and recommend prevention plans before health risks become a major issue. Through wearables and other tracking devices that take into account historical patterns and genetic information, it’s possible to recognize a problem before it gets out of hand.

Omada Health is a digital therapeutics company that uses smart devices to create personalized behavior plans and online coaching to help prevent chronic health conditions, such as diabetes, hypertension, and high cholesterol.

Propeller Health created a GPS-enabled tracker for inhaler usage and synthesizes data on at-risk individuals with environmental data from the Centers for Disease Control and Prevention to propose interventions for asthma sufferers.

On the mental health side, the young Canadian startup Awake Labs tracks data of children suffering from autism through wearables, alerting parents before a meltdown occurs.

Get To Know Other Data Science Students

Bryan Dickinson

Bryan Dickinson

Senior Marketing Analyst at REI

Read Story

Pizon Shetu

Pizon Shetu

Data Scientist at Whiterock AI

Read Story

Ginny Zhu

Ginny Zhu

Data Science Intern at Novartis

Read Story


The National Academies of Sciences, Engineering, and Medicine estimates that around 12 million Americans receive misdiagnoses, which can sometimes have life-threatening repercussions. A BBC article notes that diagnostic errors cause an estimated 40,000 to 80,000 deaths annually.

One of the most effective uses of data science in healthcare is medical imaging. Computers can learn to interpret MRIs, X-rays, mammographies, and other types of images, identify patterns in the data, and detect tumors, artery stenosis, organ anomalies, and more.

Stanford University researchers have also developed data-driven models to diagnose irregular heart rhythms from ECGs more quickly than a cardiologist and distinguish between images showing benign skin marks and malignant lesions.

Iquity, a large-scale predictive analytics healthcare platform, conducted a pilot study by analyzing four million data points from 20 million New York residents. Testing with a combination of misdiagnosed and correctly diagnosed patients of multiple sclerosis, Iquity predicted with 90 percent accuracy the onset of the disease eight months before it could be detected with traditional tools, like magnetic resonance imaging and spinal tapping.

Even online searches can help with diagnostic accuracy. Microsoft researchers analyzed 6.4 million users of Bing whose search results suggested that they had pancreatic cancer. Looking back at previous queries for keywords, such as blood clots and weight loss, researchers found that they could use search engine topics to predict a future pancreatic cancer diagnosis.


With more data on individual patient characteristics, it is now possible to deliver more precise prescriptions and personalized care. With initiatives like the National Institutes of Health’s 1000 Genome Project, an open-source study of regions of the genome associated with common diseases like coronary heart disease and diabetes, scientists are learning more about the complexity of human genes and learning that, often, one size does not fit all when it comes to medication and treatments. Data science along with the support of data scientists and machine learning engineers is also helping with the emerging field of gene therapy, which involves inserting genetic material into cells instead of traditional drugs to compensate for abnormal genes.

Emory University and the Aflac Cancer Treatment are partnering with NextBio to study medulloblastoma, a malignant brain tumor typically affecting children. Although radiation therapy was previously the only form of treatment for this type of cancer, NextBio can examine clinical and genomic data to find a patient’s specific biomarkers and customize treatment. Mount Sinai researchers also used biomarker models and cancer genomic data to segment types of bladder cancers that were resistant to chemotherapy and thus would need other treatment methods.

Post-Care Monitoring

After any type of surgery or treatment, there is the risk of complications and recurring pain, which can be difficult to manage once the patient leaves the hospital. Remote in-home monitoring helps doctors stay in touch with patients in real-time while freeing limited and costly hospital resources.

Intel’s Cloudera software helps hospitals predict the chances that a patient will be readmitted in the next 30 days, based on EMR data and socioeconomic status of the hospital’s location.

SeamlessMD’s multimodal platform for post-operative care enabled the Saint Peter’s Healthcare System in New Jersey to reduce by one day its average length of stay post-surgery, saving an average of over $1,500 per patient. Patients checked in daily on their apps to input data on pain levels, allowing the care team to track progress over time and receive intelligent alerts on potential problems.

Hospital Operations

Hospitals are cost-sensitive and face complex operational problems, such as how many staff to assign at certain hours to maximize efficiency, how to ensure enough hospital beds are available to meet patient demand, and how to enhance utilization in the operating room. Predictive analytics can optimize scheduling and even go so far as to tell hospital staff which beds should be cleaned first and which patients may face challenges during the discharge process.

Analytics software can streamline emergency room operations, ensuring that each admitted patient goes through the most efficient order of operations. Emory University Hospital used data science to predict the demand for different types of lab tests, cutting wait time by 75 percent.

Furthermore, business intelligence can streamline billing, identify patients who are at risk of late payments or financial difficulties, and coordinate with financial, collections, and insurance departments. The Center for Medicare and Medicaid Services saved $210.7 million by applying big data analytics in fraud prevention.

What’s Next for Data Science in Healthcare

Now is the right time for a data-driven healthcare industry and many players are participating in this change, including large biotech and pharmaceutical companies, payers and providers, hospitals, university research centers, and venture-backed startups. Data science can save lives by predicting the probability that patients will suffer from certain diseases, providing AI-powered medical advice in rural and remote areas in underserved communities, customizing therapies for different patient profiles, and finding cures to cancer, AIDS, Ebola, and other terminal diseases.

As in any industry, there are concerns about the use of data science in healthcare. From a logistical standpoint, data often lives in disparate states, hospitals, and administrative units and it is challenging to integrate it into one cohesive system. Many patients are additionally concerned about the protection and privacy of their healthcare information, especially as companies like Google face lawsuits for using sensitive health information in ad targeting. Although data science can solve the shortage of doctors in many countries, some worry about outsourcing the important doctor-patient relationship to computer algorithms and machines.

(Click here for the story of a Springboard data science alum who transferred the skills he developed as a scientist into a private sector healthcare role.)

About Leah Davidson

A graduate of the Wharton School of Business, Leah is a social entrepreneur and strategist working at fast-growing technology companies. Her work focuses on innovative, technology-driven solutions to climate change, education, and economic development.