Python is one of the languages that has become the lingua franca of data science (the other being R). In this module, you'll learn to program in Python and start using an ecosystem of useful and powerful Python-based tools for doing data science and building an online portfolio.
- iPython Notebook
- Git and Github
Estimated Time: 10+ Hours
It is estimated that data scientists in industry spend the most time on data wrangling i.e. cleaning the raw data and getting it into a format amenable for analysis, usually with the help of semi-automated tools. In this module, you'll learn the most common tools and workflows in Python that make this normally onerous task a snap.
- Deep Dive into Pandas for Data Wrangling
- Data in files: Work with a variety of sources from unstructured/semi-structured text files (.txt) to delimited/structured/nested format files like excel, csv, json, xml etc.
- Data in Databases: Get an overview of relational and NoSQL databases and practice data manipulation with SQL.
Estimated Time: 17+ Hours
If there's one thing that most data scientists would have loved to know before they entered the field, it's that data science is not just about the math, the algorithms and the analysis, it's also about telling a good story. In real life, data scientists don't work in a vacuum - there's always a client, internal or external, waiting on the results of their work.
A data story is a powerful way to present insights to your clients, combining visualizations and text into a narrative. But storytelling is an art, and needs creativity. This section will try to get your creative juices flowing by suggesting some interesting questions you can ask of your dataset, and a few plotting techniques you can use to reveal insights.
You’ll practice the concepts learned by creating a data story.
Estimated Time: 10+ Hours
Statistics is the mathematical foundation of data science. Within statistics, inferential statistics is a set of techniques that helps us identify significant trends and characteristics of a data set. Not only is it useful to explore the data and tell a good story, but also opens the way for deeper analysis and actual predictive modeling. In this module, we cover several important inferential statistics techniques in detail.
- Theory and application of inferential statistics
- Parameter estimation
- Hypothesis testing
- Statistical significance
- Correlation and regression
- A/B Testing
Estimated Time: 13+ Hours
Machine Learning combines aspects of computer science and statistics to extract useful insights and predictions from data. Machine Learning is what lets us make useful predictions and recommendations, or automatically find groups and categories in complex data sets. In this module, we'll cover the major kinds of machine learning algorithms (supervised and unsupervised), with several techniques within each of them. You'll learn when these algorithms are useful, the assumptions they incorporate, the tradeoffs they involve and the various metrics you can use to evaluate how well your algorithm performs.
- Supervised and unsupervised learning
- Fundamentals: Regression, Naive Bayes, SVM, Decision trees, Clustering
- Advanced: Recommender systems, Anomaly detection, Time series analysis
- Validation and evaluation of machine learning
- Feature engineering
- Best practices for applying machine learning
Estimated Time: 53+ Hours
Have you seen the stunning interactive visualizations on news websites such as New York Times or FiveThirtyEight? Have you wondered how those are created? These advanced interactive visualizations not only look great and show your skills, but are also excellent tools for exploring complex, high-dimensional data sets.
Estimated Time: 5 Hours
You now know how to work with data sets that easily fit in the memory of your laptop. But what happens when that's not the case? A data scientist often has to know how to scale these analyses and algorithms to really huge data sets. This is where "Big Data" technologies like Hadoop and Spark come in. Hadoop is an open-source implementation of map-reduce, one of the first major algorithmic innovations in big data, and arguably the algorithm that allowed Google to become the behemoth it is today. Spark is Hadoop's newer, younger cousin -- a technology that addresses some glaring flaws and inefficiencies in Hadoop, and allows many complex machine learning and other analytical techniques to be implemented at scale in highly efficient ways.
- Intro to Big Data
Estimated Time: 10 Hours
In this program, you'll complete two Capstone Projects for your portfolio. You'll work on the first project as you go through the main part of the curriculum, and on the second project as you're focused on your job search.
The Capstone Project is a key part of our curriculum that every student must complete. The project is designed to provide you with the experience of working in a realistic data science scenario. Working with your mentor, you'll pick a data set and a problem of interest. From the start to the finish, your project will be targeted to a specific client (real or imaginary). Using the data science techniques you've learned, you'll not only come up with a reasonable solution to the problem, but learn to present it to them as a compelling story.
Estimated Time: 50 Hours
We provide career material at strategic points both in the curriculum as well as via calls with our expert career support coach. We'll help you create a tailored job search strategy based on your background and goals, teach you how to evaluate companies and roles, show you how to effectively get and ace interviews, and negotiate on salary.
- Anatomy of a tech company
- The job search strategies that top candidates use
- How to build your network and effectively use it to land interviews
- Create a high-quality resume, LinkedIn profile and cover letter
- Interview coaching and practice, including mock interviews for both technical and non-technical topics
- Negotiation success tips