Podcasts, Twitter, and Newsletters: Rounding out your data science education

Thanks the internet, aspiring data scientists can easily learn key concepts from elite universities. For instance, students who complete our Data Analysis learning path will take MIT’s algorithms course and Stanford’s machine learning course. They’ll listen to the same lectures, complete the same assignments, and ultimately learn the same concepts.

The experience, of course, isn’t exactly the same. Most online courses don’t give direct access to professors and don't provide the unstructured learning that plays a major role in university life. Students studying statistics or computer science at a university have their core coursework complemented by optional seminars, interactions with their professors, and the occasional late-night inebriated conversation with peers. Through these casual interactions, they gain exposure to an eclectic mix of topics ranging from the history and culture of the field to the newest techniques being developed by researchers. The result is a holistic understanding that guides decisions about future coursework and careers.

Although it’s difficult to replace in-person interactions, we’ve found ample online resources that provide the same type of learning by osmosis. The podcasts we’ve listed below will keep you up to date on the topics being discussed by top data scientists, ranging from cutting edge techniques to allegations of cheating on major contests. If you follow the Twitter accounts we’ve listed, you’ll get a glimpse of what those data scientists are thinking about day to day, while the newsletters we recommend are the same ones that they’re reading.

Podcasts:

1. Talking Machines

No matter how much or how little you know about data science, you should listen to Talking Machines. Each episode starts with an overview of a powerful concept like Markov Chain Monte Carlo or collaborative filtering. The majority of each episode is then devoted to an interviewing prominent data scientists like Andrew Ng or Kevin Murphy. The podcast is hosted by Harvard professor Ryan Murphy and journalist Katherine Gorman. Ryan in particular has a remarkable ability to explain complicated ideas in simple terms while still keeping the podcast interesting for advanced listeners. Listen here.

2. Partially Derivative

To continue the university metaphor, Partially Derivative is a late night conversation with slightly inebriated friends. The comparison is particularly apt as the hosts start each episode by telling the audience what beers they’re drinking. The show self-describes as “the show about data, data science, drinking, and awesomeness,” and there’s plenty of laughter as well. The podcast started as an overview of recent data-related articles, but recently they’ve added interviews with guests. They focus on cool applications of data science, like predicting which Game of Thrones characters will die or understanding what makes Indian food taste good. Despite the light-hearted banter, the hosts are serious data scientists and give real insight into the articles they cover. Listen here.

3. Linear Digressions

Udacity’s data science podcast is structured as a conversation between hosts Katie Malone and Ben Jaffe. The episodes, which tend to be brief and entertaining, cover a variety of topics ranging from neural nets to careers in data science. Overall, this is the most accessible podcast we’ve encountered. Listen here.

4. The O’Reilly Data Show

If we were to compare our podcasts to experiences at a university, the O’Reilly Data Show is equivalent to sitting in on a graduate level seminar. The target audience is practicing data scientists and machine learning researchers rather than aspiring data scientists, and episodes have titles like “The tensor renaissance in data science” or “Coming full circle with Bigtable and HBase.” There’s a lot to learn in each episode, but newcomers to data science may have to put in some effort to understand what’s being discussed. Listen here.


Twitter:

Twitter is roughly the equivalent of eavesdropping in the faculty lounge. You might not understand everything, but you get to hear what data scientists are thinking about day to day and you’ll learn a lot along the way. We’ve focused on selecting accounts that are active and relevant, which means that we’ve excluded some important data scientists whose Twitter accounts are inactive.

5. Data Science Thought Leaders

  • DJ Patil (@dpatil) and Jeff Hammerbacher (@hackingdata) originally coined the term back in 2007. Jeff is now at Cloudera while DJ works as the US’s Chief Data Scientist.
  • Nathan Yau (@flowingdata) wrote “Rise of the Data Scientist” in 2009. His Twitter feed is filled with interesting data visualizations.
  • Drew Conway (@drewconway) developed the famous Data Science Venn Diagram in 2010.
  • Hilary Mason (@hmason) and Chris Wiggins (@chrishwiggins) wrote “A taxonomy of data science” in 2010. Hilary is now the founder of Fast Forward labs (and has more than 60k Twitter followers) while Chris is the Chief Data Scientist for the New York Times.
  • Kirke Borne (@kirkdborne) is the Principal Data Scientist at Booz Allen and provides a steady stream of interesting links.
  • Gregory Piatetsky (@kdnuggets) is the president of KDnugget. He’s a prolific tweeter, with an average of nearly 10 tweets a day. Everything he tweets is directly related to data science. Ben Lorica (@bigdata) is the chief data scientist for O’Reilly Media.

6. R and Python Experts

  • Wes McKinney (@wesmckinn) is the creator of Pandas and is currently working on Ibis.
  • Andreas Mueller (@t3kcit), Gael Varoquaux (@gaelvaroquaux), and Olivier Grisel (@ogrisel) are major contributors to scikit-learn. All of them frequently tweet updates on their respective projects as well as links to videos of conference talks.
  • If you’re interested in R, David Smith (@revodavid) and Garrett Grolemund (@statgarrett) both tweet lots of tips about getting the most out of the language.

7. Industry Data Scientists

  • Mike Olson (@mikeolson) co-founded Cloudera and is now their chief strategy officer. Josh Wills (@josh_wills) is Cloudera’s senior director of data science.
  • Kaggle co-founders Anthony Goldbloom (@antgoldbloom) and Ben Hammer (@benhammer) are both worth following.
  • Pete Skomoroch (@peteskomoroch) was a principal data scientist at LinkedIn. Monica Rogati (@mrogati) was a data scientist at LinkedIn and the VP of data at Jawbone.

8. Other Twitter Accounts

  • Data Science Renee (@BecomingDataSci) has used her account to chronicle her journey to becoming a data scientist.
  • Ryan Adams (@ryan_p_adams) is the host of Talking Machines.
  • You can also find the hosts of Partially Derivative, Chris Albon (@chrisalbon) and Jonathon Morgan (@jonathonmorgan).
  • Reddit’s /r/dataisbeautiful has its own account (@dataisbeautiful) that tweets popular data visualizations.
  • Finally, Twitter tweets visualizations of its own data (@TwitterData)


Newsletters:

9. Data Elixir

Data Elixir is a beautifully curated newsletter. Each edition of the weekly newsletter typically has two or three items in each of several categories: news articles, tools and techniques, resources, jobs, and data visualization. The newsletter’s content and website both feel very clean and uncluttered; the articles have clearly been carefully selected. The curator, Lon Riesberg, has a knack for finding good articles that haven’t been covered as well in other places. Subscribe here.

10. Data Science Weekly

Data Science Weekly has a similar feel to Data Elixir with selected items grouped by category. Like Data Elixir, it contains a mix of news articles and more technical resources and is well curated. It also provides a good mix of articles from around the internet, rather than just the best articles from a single site. Subscribe here.

11. KD Nuggets

KD Nuggets has a twice weekly newsletter that summarizes some of the best content from the high volume of articles published on this website. It’s less aggressively curated than Data Elixir or Data Science Weekly, but it’s still a must read. Subscribe here.

12. O’Reilly Data Newsletter

Like O’Reilly’s podcast, the O’Reilly Data Newsletter is oriented towards professionals and big data practitioners. It’s a bit more accessible than the podcast, though, in part because readers can pick and choose which articles are most interesting to them. The newsletter typically includes 10-12 articles each week, and it’s arguably the most influential newsletter in the data science community. Subscribe here.