The healthcare sector is swimming in data. Up to 30% of the world’s stored data now comes directly from the healthcare industry thanks in large part to the widespread adoption of wearable technology, digital health coaches, and virtual pharmacies—all of which have given rise to individualized patient data on an unforeseen scale. That’s in addition to the more traditional, “low-tech” data sources like electronic medical records (EMRs), clinical trials, and care management databases that predated modern data analysis. In March 2020, when COVID-19 cases began rising sharply, triggering lockdowns in numerous countries around the world, telehealth visits increased 154% in the last week of March, according to the CDC

With so much data available, it’s surprising that predictive analytics has made only modest inroads into improving healthcare access, personalized patient care, and drug development.  Just 3% of U.S.-based data scientists work in the healthcare industry, showing that this specialized field—which combines a knowledge of health information and data analytics—lacks practitioners. 

Springboard’s Rise 2020 virtual conference, held in October this year, featured a panel discussion with three data scientists who work in the healthcare industry, each one tackling a uniquely different aspect of patient care. 

Increasing access to healthcare

Racial disparities in healthcare became even more glaring during the COVID-19 pandemic, with studies showing that minorities were four times more likely to be hospitalized for the coronavirus than non-Hispanic whites. Spora Health, a telemedicine provider, is working to provide “culture-conscious” primary care for people of color. Each user receives a personalized health blueprint based on their personal data. Lifestyle factors such as diet, smoking habits, and stress levels are used to calculate a person’s risk of developing chronic conditions like diabetes, hypertension, depression, and anxiety. The app then synthesizes this data to provide health assessments and check-ins to help the user track their condition over time. 

“We create these questionnaires to correlate folks’ responses to the probability of having a certain disease,” Waco Holve, a data scientist at Spora Health, explained during the panel discussion at Rise 2020. “Because we’re a culture-based healthcare company, we focus on diseases that disproportionately impact the African-American population.” 

Just one disease can be caused by over 7,500 different contributing factors, says Holve. The key to conducting an effective health risk assessment is being able to infer certain patient behaviors, such as bad habits people won’t admit to. Determining if someone is a smoker without directly asking “Do you smoke cigarettes?” is one example. 

Holve, who spent several years working as a hedge fund trader, used a similar approach when building a semi-automated data mining process to develop health risk assessments. Hedge fund analysts use databases to sort, compare, and compile market data. “I approached it from the stance of how I would build a trading model for diabetes that could model terabytes of data.” 

Tracking the COVID-19 pandemic

While the U.S. failed to embrace a national contact tracing strategy, state governments, universities, and school districts mounted efforts of their own. Absent a national contact tracing effort, the next-best alternative is to survey the rate of transmission to predict the location of coronavirus hotspots. Rt.Live is a coronavirus tracking website founded by the co-creators of Instagram, Kevin Systrom and Mike Krieger. Thomas Vladeck, an expert in marketing analytics who spoke at Rise, joined the project as a volunteer earlier this year. “I literally just cold-emailed him [Kevin Systrom],” he said. “I wrote up my analysis, how I would approach the model and we eventually started collaborating pretty intensely over Zoom for the next few months to get a new version up.” 

The site tracks the spread of COVID-19 on a state-by-state basis by calculating the effective infection rate for each one, referred to as Rt. In epidemiology, R0 refers to a basic reproduction score that indicates the number of secondary infections produced by a single infection. “If the value is above one, it means [the disease] is spreading,” explained Vladeck. “If it’s below one then it’s on its way out.” 

The site pulls data from The COVID Tracking Project and enables users to filter results by states that enacted shelter-in-place orders and those that didn’t. The site was first launched with the intention of providing state governments with science-backed data to help them determine when to reopen.  

The team would field occasional phone calls from state governments asking for data models to predict the outcomes of specific policy decisions, and worked closely with Florida’s governor and surgeon general ahead of the state’s reopening in May. “It was really validating to see that the tracker was being used by real decision-makers and it impacted how they were thinking about the problem and what they were doing,” Vladeck recalled. “Hopefully, Rt.Live isn’t around for much longer. I hope it gets obviated by a vaccine or social distancing in the next six months.” 

Predicting patient outcomes

Data collected from clinical trials makes it possible to predict how patients will respond to certain drug treatments, leading to a field called precision medicine, an approach to disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle of each person.

Chinmay Shukla, a senior data scientist at PathAI who has a Ph.D. from Harvard Medical School, trains deep learning models to quantify pathology images generated from slides containing disease specimens. He then uses fundamental statistical models to draw insights into these diseases and predict the effects of particular drugs. “Anytime a pharmaceutical company does a clinical trial, that’s when I get involved,” Shukla explained. He assists pharmaceutical companies with making sense of their data during drug trials, especially for cancer treatment. 

Shukla had always loved math since he was a young boy. Growing up in India in the 90s, he says any child who showed an affinity for numbers was encouraged to become an engineer. So he did.

“Then I realized I could apply these quantitative skills to help advance human health, so I got really interested in biotechnology and drug development because I saw there was a huge unmet need.”

For more Rise 2020 coverage, check out posts on how data science can be leveraged for social good and tips on transforming your career in a post-pandemic world.