Data scientist training on campus
You have seen the headlines – data scientist is the “sexiest job” of the 21st century. In an economy driven by data, businesses are hunting talented data scientists that can turn overwhelming amounts of data into actionable insights. Not surprisingly, many college graduates are gearing their education towards a career in data science. This article will help those currently on campus maximize their return on investment by getting data scientist training while in university.
Unfortunately, most universities do not offer major programs in data science or degrees that are explicitly marked for data scientist training. So how do you prepare for a career in data science? There is no standard roadmap to follow on your path to becoming a data scientist. There are many career paths in data science. Any two data scientists could come from different academic backgrounds, use different coding languages, and solve entirely different problems. Data scientist and top-ranked big data and machine learning influencer Kirk Borne studied astrophysics during his time as an undergraduate, leading to a career as a scientist at NASA before settling at Booz Allen Hamilton. In comparison, U.S. Chief Data Scientist DJ Patil went to community college before completing his bachelor’s degree in math, and working with titans in the tech industry such as Ebay and LinkedIn.
Here’s a quick outline of the necessary steps. Click on any of these to jump to the section on that topic:
- Pick a major that will translate to a data science career – Statistics, computer science, physical sciences, and social sciences are all good choices. The major doesn’t have to be specifically math-focused and data-focused, but you have to be able to apply data to the field you study.
- Take relevant courses – Data scientists need strong skills in statistics and programming. Regardless of your choice of major, focus on the topics listed here and learn to code. That will create a solid foundation for your data scientist training.
- Research with a professor – This is extremely important to do in college. Reach out to a professor. Show that you are interested in learning and you are capable of creating publishable academic literature.
- Create a personal project – Learning isn’t contained to the classroom. The best learning comes from applying what you learn in the classroom to a topic you actually care about, whether that is building your own app, writing a blog, or doing exploratory data analysis on your own.
- Prepare for graduate school – This is a catch-all for a lot of smaller tasks. Aim to earn good grades and turn positive professor relationships into glowing letters of recommendation. Always remember that your education doesn’t end with a bachelor’s degree.
Pick a relevant major
Whether you’re applying to schools, undeclared, or considering switching your major, you want to make sure that you choose a major that gives you the right skills to become a data scientist.
Choose a STEM major. Social sciences, physical sciences, math, engineering, or statistics – that is it. Why choose these majors? Because they will directly teach you the skills needed to become a data scientist, and within each field of study there are opportunities to apply data science skills within that field for research.
Which STEM major is best for somebody looking to do data scientist training? There is no easy answer. The best questions to ask yourself are, “Which major am I most interested in? Which major piques my curiosity?”
The educational background of data scientists. Source: Burtch Works Study, 2015
Data scientists come from a variety of STEM majors – chemistry, psychology, economics, mathematics, computer science. This is because data science can be applied to solve problems across many disciplines. Data science is the application of analytical skills, scientific method, and computational skill to solve problems across professions.
The reverse is also true: English, History, Education, or any non-STEM majors are unlikely to help with data scientist training.
Pick the right classes
Your major is what gets put on your LinkedIn profile and your bachelor’s degree, but it’s really the skills you learn that’ll break you into a data science career. Your classes will provide a great environment to learn and practice these skills in the vein of data scientist training, and should be approached as such – not just as a required hurdle that must be cleared to get the degree. Data scientist training involves mastering a set of unique skills. Unfortunately, course catalogs are huge, and there are hundreds of different classes to take, even within a major such as computer science or statistics so choosing the right courses can be confusing.
One quick tip: don’t get lured into taking too many classes that are irrelevant to your interests or career goals. That “mushrooms, mold, and society” class I took my freshman year was neat, and I know more about fungi than most people do – but that was a waste of 5 units. I would have been much better off taking an Introduction to Java class that teaches skills relevant to the career I’m pursuing.
For a data scientist, there are two main subjects you want to master and take classes on during your undergrad: statistics and computer science.
These two fields are fundamental to data science. You cannot succeed as a data scientist without training in these two fields.
You most likely won’t have a formal statistics education going into university, so don’t shrug off introductory statistics classes. These will be extremely important to building the foundation for your data scientist training.
Here is a list of topics in statistics you should cover once you have opportunity to choose from upper-level course offerings so you can round out your data scientist training. Keep in mind that course names will vary by university, so read the course descriptions and make sure they include the following concepts:
- Bayesian Statistical Inference
- Data & Web Technologies for Data Analysis
- Fundamentals of Statistical Data Science
- Statistical Computing
- Analysis of Categorical Data
- Applied Time Series Analysis
- Multivariate Data Analysis
If you struggle with statistics and are worried about taking upper-level courses – Don’t fear. There is a bounty of resources online to help you. When you are stuck or confused about a topic, consult Khan Academy courses. Springboard’s free data analysis learning path is a great resource as well. This blog post on how Bayes Theorem and logic interact will help you with the basics of probability.
As for the computer science courses, you have more freedom of choice. The classes you take will depend on what skills you are missing, what skills you want to improve, and what projects you want to create.
If you have no experience coding, now is the time to choose a language to learn (I recommend Python or R, check out this article to understand why, and for help choosing between the two). Learning your first language is always the hardest, but there is no better way to learn than with the support of somebody who can mentor you. The following will be a good starting place for computer science students if you have no coding experience.
If you have no experience coding:
- Introduction to Programming
- Programming and Problem-Solving
- Computer Organization and Machine-Dependent Programming
- Data Structures and Programming
For those of you who already know how to code, you have two options. Continue taking courses on your programming language of choice in an attempt to achieve mastery, or begin training yourself in different languages. There is no right or wrong, but for a data scientist in training, it is critical to have one language that you feel especially comfortable using. Andrew Flowers, data analyst and journalist from the popular data-centric blog, FiveThirtyEight, recommends that data science students focus on learning one language to completion – and hopefully that is a language that is flexible, like Python. Essentially, it is better to code one language well than to code in five languages poorly.
Once you are proficient as a coder, take courses on the following:
- Algorithm Design and Analysis
- Scientific Computation
- Probability and Statistical Modeling for Computer Science
- Software Engineering
- Database Systems
- Introduction to Artificial Intelligence
- Machine Learning
- Image Processing and Analysis
- Computer Vision
Research with a professor
One of the best opportunities afforded to undergraduate students at a research university is the opportunity to work alongside professors. There are numerous tangible benefits from completing lab research: higher-level learning, publishing and presenting a paper, receiving a letter of recommendation, bolstering your professional network, and developing your skills as a data scientist and communicator.
In a peer review by the Association of American Colleges and Universities (AACU), it was found that only 18% of undergraduate researchers believe they learned more in courses than in their labs.
Let’s make one thing clear: finding work in a lab requires diligence. Professors are welcoming to curious and hardworking students who show a gift for the subject they are studying. With thousands of undergraduates on campus during the school year, you must set yourself apart from the other students and show a professor you’re worthy of significant time investment.
Here’s how you get a research position:
- Show genuine interest in the focus of the research – Academic research often focuses on answering a very narrowly defined question and it features a lot of repetition. If you are not excited by the process, the professor, or the question you are solving, you shouldn’t be working in that specific lab. Use your university’s undergraduate research center to find professors and labs with open opportunities. Look through publications from a professor to see if the subject is something you find interesting or not. Take the time to find out what research the professors who are teaching your classes do.
- Get good grades – Half of college is showing up, the other half is doing more than what’s expected of you. Prioritize your grades. Once you do this, doors will open up for you. Professors don’t want to take on a student who is averaging a 2.0, but if you are pulling a 3.0 or higher you will be able to show professors that you care about learning and you work hard.
- Talk with professors, TAs, and go to office hours – professors enjoy talking with students who have done their work and are curious about topics beyond the scope of the class. If you have learned the material and show up to class, you have no reason to fear going to talk with your professors – they are there to interact with you. Office hours are a great opportunity to do this! So is your local coffee house. Schedule a time to meet, and have fun with it! Building this connection with an expert is fundamental to your data scientist training.
Through a research position, you will gain an invaluable network of peers and professors who can recommend you to employer and help you choose a graduate school. You will need at least two letters of recommendation from professors or a professional reference to go to graduate school or to get a data science internship during your college career, and working in a research position will help you get great letters.
An example of an (inter)stellar research opportunity for undergraduate data scientists would be the ASSURE program at University of California, Berkeley. ASSURE (Advancing Space Sciences through Undergraduate Research) actively seeks undergraduates to perform space science research over a 10-week summer program. The abstract below describes how students created a suitable alternative database to store SETI@home data – an experience that would be invaluable to a learning data scientist.
Erick Quintanila, Jeff Cobb, and Eric Korpela researched and proposed alternative database structures for the SETI@home data, Summer 2016.
Create a Personal Project
Budgeting your time is difficult during your undergrad – you have the freedom to dictate the pace of your day between courses and extracurricular activities. A lot of your time will be budgeted to courses, eating, sleeping, friends, Greek life, and staying healthy. If you balance everything right, you will have some free time — and the best way to use that free time is to start a passion project to cement your data scientist training.
What does a passion project look like for a data scientist? Anything where data is manipulated to arrive at conclusions. The best problems and questions are the ones you care about. If you’re a huge fantasy football geek, like myself, you could try to build the best predictive model for fantasy football possible. If you like going to concerts, you could create a Slackbot to automatically scrounge the internet to alert you when shows from your favorite artists are in your area, and when tickets are on sale.
An engineer and friend of mine recently created a Chrome extension for ESPN Fantasy Football leagues that gives users the ability to see a hybrid standings (including a H2H win and highest points scored win) scoring system on the ESPN Standings page. The hybrid standings system modifies the standard standings system to also give a weekly “win” if your score for that week was in the top half of the league – mitigating the luck factor in fantasy football.
Screenshot from the extension overview page of Fantasy Football Hybrid Standings app, a personal project of my computer-science engineering friend.
Here are some tips so you can build your own personal project:
Don’t be afraid of failure. Taking on a project from scratch is a daunting task, but don’t worry about failure. Failure teaches success, and along the process of building a project and failing to complete it you will have learned an incredible amount. As an undergraduate you are young, and have nothing to lose. Be realistic and relentlessly optimistic.
Learning from doing (and at times failing) will be more applicable than what you learn from most classes. You will learn how to find tools to create your project, see what other projects have already occupied the space you are trying to fill, and test your creativity and perseverance. You’ll also develop interpersonal skills – you will be forced to network with people who know more about the subject you are working on, and to practice communicating your plan and vision for your project. These are all invaluable business skills that will benefit you later in your life. Plus, if you publish a project or Github repository, you can point to these projects on your resume and LinkedIn profile, which employers love to see. It highlights your independence and entrepreneurial mindset.
Check out some of the projects completed by student data scientists at Springboard for inspiration and motivation, and consider joining a mentored workshop if that’s what it takes to get you to a project. Again, be patient with yourself – you will most likely not have a fraction of the skills needed to make a finished project the moment you conceptualize it, but by gradually building your skills and applying them to your project idea, you’ll cement your data scientist training.
Prepare for graduate school
For the majority of data scientists, education is not finished after completing a bachelor’s degree. Most data scientists have a PhD or master’s degree, and have completed training through bootcamps or online workshops to hone a specific skill, such as using Hadoop for Big Data Querying or machine learning (like our Data Science Intensive workshop). In its data science salary report, Burtch Works determined that 44% of data scientists have a master’s degree and 46% have a PhD (Check out our report on data scientist salaries for more information).
It does not have to be a Data Science master’s program. As I said before, many successful and famous data scientists received their PhDs in fields such as astrophysics or mathematics, and then the skills learned through their graduate research allowed for a natural transition to a career in data science
You need three things to get into a good master’s or PhD program: good grades, recommendations, and a subject that you are passionate about. Without one of the following, it’ll be difficult to get into a good graduate school.
I also recommended that you get work experience after your bachelor’s degree, for several reasons. First, it will help you have money to pay for your schooling, and in some cases, employers will pay for your graduate school if they want to keep you on for upper-level positions. It will also make you appreciate going back to school a lot more after working in the ‘real world’.
Some tips when it comes to preparing for graduate school as an undergraduate:
- Get good grades: According to About Education, if you’re applying to a Master’s program, most schools have a GPA cutoff of 3.0 – 3.3. For doctoral programs, that cutoff is 3.3 – 3.5
- Prepare for the GMAT/GRE: For a data scientist, I recommend taking the GMAT. It is more analytics-based than the GRE, and will perhaps be easier for a student with a mathematical background
- Talk to your professors about letters of recommendation: Take care of this at the end of all your research positions and tell your professor that these are for graduate school applications. Even professors you do not do research with may write you a recommendation if you have a good working relationship with them and you ask politely.
- Search for graduate program with labs that you are interested in: Be thorough with your search. The most important aspect of the school you choose will be the lab you work in and the people you surround yourself with. Make sure you really consider who you want to work with and why.
Check out this website for information about data science masters programs across the country, it has a thorough list where you can compare 23 of the best programs available. If you don’t think that you want to master in data science, that is completely fine – it is not a requirement that you sign up for a data science program as long as you are applying big data and data science specific skills to solving problems within your field!
To wrap it up
College is filled with opportunity to do data scientist training. It is up to you to take advantage of the opportunity given to you to do data scientist training while on campus. Get good grades, meet professors, research, and become the data scientist you want to be. It takes patience and focus, but the investment of time into pursuing a career as a data scientist will pay-off well in the long run.
I hope that now, you’ll want to get back to work, and become the data science maestro you deserve to be.