Data Science

# 15 Free Data Sets for Your Next Project or Portfolio

8 minute read | June 29, 2022

Written by:
Sakshi Gupta

If you’re early in your career as a data scientist, you might want to consider taking on some personal projects. There are two reasons why.

Firstly, it’s a way for you to test yourself. You’ve probably spent many months working through data science theory and studying different approaches to analyzing data. But how do you know you’ve actually gained useful real-world skills? You can do that by choosing a problem that seems interesting to you and unleashing your newfound analytical skills to solve it.

Another important reason you should build projects is so that you have something to put in your data science portfolio. Recruiters prefer to look at candidates’ portfolios instead of reading long statements of purpose or lists of classes or top data science bootcamps they’ve taken. A portfolio evinces that you’ve got practical skills and the ability to take projects from conception to completion.

Now if you want to work on a data science project, then you can’t do that without the data. If you’re wondering where you can source data from, we’ve got you covered. We’re going to c

But before we get into those resources, let’s take a look at what a data set is.

## What Is a Data Set?

You’ve probably figured out partially what a data set is from what it’s called. It is, of course, a set of data points. But along with that, it’s also important to remember a few other characteristics that a data set must exhibit.

A data set is always formed of related data. Let’s say you have a data set about a housing subsidy program. In that case, the data would include data points relating to the prices of houses over time, the demographics of the buyers, the areas where these programs are run, and so on. All of these data points are related and therefore would constitute a data set.

Secondly, the data in a data set is always discrete. Each record is independent and can take the form of only a finite value.

Data sets are most commonly stored in a tabular format. Every column in the table corresponds to a specific category of information. The rows are the data values that fall under that specific category.

For example, assume that you have a database on stock prices during a certain period. Some of the columns that you would have in this data set are company name, company stock price, change in stock price year on year, and so on. As you can see, the values that would be entered in this table would be related, discrete, and structured.

#### Become a Data Scientist. Land a Job or Your Money Back.

Build job-ready skills with 28 mini-projects, three capstones, and an advanced specialization project. Work 1:1 with an industry mentor. Land a job — or your money back.

## Free Data Sets To Analyze

Now that we know what a data set is, we can move on to looking at some of the best public data sets that are out there. These data sets have been sourced from government agencies, private companies, and public institutions. All of the data available in them is structured, so you don’t have to worry about cleaning data

### Free General Data Sets

#### Kaggle

Kaggle is a community that has been built specifically for data scientists and machine learning engineers. The goal is to have a place where members can work on Kaggle data problems together and access data sets so they can regularly practice data analysis

Kaggle has something to offer for data scientists across levels, whether that’s a simple data set for students or something advanced for a data scientist looking to work on their artificial intelligence chops. The platform is also known for hosting regular competitions where you can go up against other data scientists to solve real-world problems posted by companies.

The Google Cloud marketplace comes with a website that offers data sets that have been sourced from various Google products. So if you want an excellent data set from services like Google Trends, Google Patents Research, and Community Mobility Reports, this is where you can find it.

Google also offers a collection of repositories from commercial and public data sets. You can conduct your analyses on Google Cloud or download the data sets and use your own tools for the job.

#### Github

You probably think of Github as a version control tool, but did you know that they also offer a wide variety of data sets that you can use for your personal projects? These are all available for free and you can quickly port the data into your project when you need it.

Let’s take this glacier mass balance data set, for example. This fantastic data set provides information on the mass of reference glaciers across the world. You can use this and similar data sets to conduct analyses on a wide range of topics.

### Free Government Data Sets

#### Data.gov

Data.gov is where all of the American government’s public data sets live. You can access all kinds of data that is a matter of public record in the country. The main categories of data available are agriculture, climate, energy, local government, maritime, ocean, and older adult health.

Along with giving access to this collection of repositories for free, the website also has various resources for data scientists. You can use it to learn more about data analysis tools, data management frameworks, and case studies of projects taken up by data scientists who work in government.

#### NYC Open Data

What Data.gov does at the federal level, NYC Open Data does for New York City. This website is a collection of repositories that offer data sourced from various public institutions that govern the city.

The main categories of data available here are business, city government, education, environment, and health. You can also browse data sets compiled by different agencies, such as the Financial Information Services Agency (FISA) or the Mayor’s Office of Climate Policy and Programs (CPP).

#### Data.europa.eu

This is the official portal for all of the public data that is offered by the European Union. The scope of the available data is broken down into national data, European data, and international data. You can find a detailed data set for just about any aspect of European life here, covering economic indicators, law enforcement agencies, health care institutions, and more.

### Free Health Data Sets

#### Healthdata.gov

Healthdata.gov is a repository of freely available healthcare data from the US government. It is managed by the U.S. Department of Health and Human Services Office.

This website is a treasure trove for anyone interested in healthcare data. You can find public data sets on everything ranging from cancer incidence to COVID-19 prevalence and impact. Working on these data sets can be especially helpful if you plan on getting a data science job in healthcare

#### Healthcare.gov

This is a federal website managed by the U.S. Centers for Medicare & Medicaid Services. The data sets available on this website are specifically geared towards medical and dental plans for groups and individuals. There’s also an API with clear documentation in case you want to source your data directly into a web application.

#### Health Statistics & Data

The Berkeley Library Health Statistics and Data website provides free access to a large variety of data sets. That includes data sets that are both nationwide statistics and specific to California state.

#### Get To Know Other Data Science Students

Pizon Shetu

Data Scientist at Whiterock AI

### Free Environment Data Sets

#### US Climate Data

The National Centers for Environmental Information offers its climate data for free through these public data sets. The goal of the undertaking is to make global climate data available for analysis and study.

The public data sets available on this website constitute a cross-section of data across months, seasons, and years. You can get information on things like temperature, wind, precipitation, and other climate data here. The site also offers specialized tools that you can use to access this climate data.

#### Global Climate Data

If you want to do a data science project on climate data, then this website offers just about every kind of data set that you could possibly need. This website by Tutiempo Network contains public data sets with climate data for every country on the planet. Some of this data goes back to the first half of the 20th century.

The data on this website is sourced from over 9,000 weather stations. It is easy to break the available data sets down by continent or country if you want to focus your analyses on one particular region.

#### US Weather History

Five Thirty-Eight—the website known for its data journalism stories—used this US Weather History data repository to produce its 2015 story What 12 Months of Record-Setting Temperatures Looks Like Across the US. Analyzing this data set is a good way to understand how data science connects with storytelling. You can use the story as inspiration to work on your data visualization skills

### Free Economic Data Sets

#### World Bank Open Data

This is a public website with data offered by the World Bank. Due to the nature of this institution, you know that you’re going to get access to economic data from across every continent on the planet.

Each data page allows you to download data in bulk in a CSV file and other file formats. There is also an API using which you can access this data to analyze or display on your own tool.

#### US Employment and Labor Data

This is a website with public data that pertains to employment levels and labor information for the United States. You can access data that covers things like inflation and prices, workplace injuries, productivity, employment benefits, etc.

#### Business and Economics Data Sets

These are business-related data sets that are made available by the Carnegie Mellon library. You can peruse data that pertains to all kinds of national and international economic information. Some examples include economic data from the federal reserve, data from the International Labor Organization, and data from the World DataBank.

## Data Set FAQs

### How Do I Know if a Free Data Set Is Complete?

You can make sure that the data you source is complete by choosing reliable sources for your data sets. Always go with data that has been made available by governments, reputed private companies, and public institutions.

### Can You Make Your Own Data Set?

Yes, you can build your own data set by sourcing data from various sources like social media sites, online directories, and so on. This is a great idea when looking to stand out in a data science bootcamp!

Companies are no longer just collecting data. They’re seeking to use it to outpace competitors, especially with the rise of AI and advanced analytics techniques. Between organizations and these techniques are the data scientists – the experts who crunch numbers and translate them into actionable strategies. The future, it seems, belongs to those who can decipher the story hidden within the data, making the role of data scientists more important than ever.

In this article, we’ll look at 13 careers in data science, analyzing the roles and responsibilities and how to land that specific job in the best way. Whether you’re more drawn out to the creative side or interested in the strategy planning part of data architecture, there’s a niche for you.

## Is Data Science A Good Career?

Yes. Besides being a field that comes with competitive salaries, the demand for data scientists continues to increase as they have an enormous impact on their organizations. It’s an interdisciplinary field that keeps the work varied and interesting.

## 10 Data Science Careers To Consider

Whether you want to change careers or land your first job in the field, here are 13 of the most lucrative data science careers to consider.

### Data Scientist

Data scientists represent the foundation of the data science department. At the core of their role is the ability to analyze and interpret complex digital data, such as usage statistics, sales figures, logistics, or market research – all depending on the field they operate in.

They combine their computer science, statistics, and mathematics expertise to process and model data, then interpret the outcomes to create actionable plans for companies.

#### General Requirements

A data scientist’s career starts with a solid mathematical foundation, whether it’s interpreting the results of an A/B test or optimizing a marketing campaign. Data scientists should have programming expertise (primarily in Python and R) and strong data manipulation skills.

Although a university degree is not always required beyond their on-the-job experience, data scientists need a bunch of data science courses and certifications that demonstrate their expertise and willingness to learn.

#### Average Salary

The average salary of a data scientist in the US is \$156,363 per year.

### Data Analyst

A data analyst explores the nitty-gritty of data to uncover patterns, trends, and insights that are not always immediately apparent. They collect, process, and perform statistical analysis on large datasets and translate numbers and data to inform business decisions.

A typical day in their life can involve using tools like Excel or SQL and more advanced reporting tools like Power BI or Tableau to create dashboards and reports or visualize data for stakeholders. With that in mind, they have a unique skill set that allows them to act as a bridge between an organization’s technical and business sides.

#### General Requirements

To become a data analyst, you should have basic programming skills and proficiency in several data analysis tools. A lot of data analysts turn to specialized courses or data science bootcamps to acquire these skills.

For example, Coursera offers courses like Google’s Data Analytics Professional Certificate or IBM’s Data Analyst Professional Certificate, which are well-regarded in the industry. A bachelor’s degree in fields like computer science, statistics, or economics is standard, but many data analysts also come from diverse backgrounds like business, finance, or even social sciences.

#### Average Salary

The average base salary of a data analyst is \$76,892 per year.

Business analysts often have an essential role in an organization, driving change and improvement. That’s because their main role is to understand business challenges and needs and translate them into solutions through data analysis, process improvement, or resource allocation.

A typical day as a business analyst involves conducting market analysis, assessing business processes, or developing strategies to address areas of improvement. They use a variety of tools and methodologies, like SWOT analysis, to evaluate business models and their integration with technology.

#### General Requirements

Business analysts often have related degrees, such as BAs in Business Administration, Computer Science, or IT. Some roles might require or favor a master’s degree, especially in more complex industries or corporate environments.

Employers also value a business analyst’s knowledge of project management principles like Agile or Scrum and the ability to think critically and make well-informed decisions.

#### Average Salary

A business analyst can earn an average of \$84,435 per year.

The role of a database administrator is multifaceted. Their responsibilities include managing an organization’s database servers and application tools.

A DBA manages, backs up, and secures the data, making sure the database is available to all the necessary users and is performing correctly. They are also responsible for setting up user accounts and regulating access to the database. DBAs need to stay updated with the latest trends in database management and seek ways to improve database performance and capacity. As such, they collaborate closely with IT and database programmers.

#### General Requirements

Becoming a database administrator typically requires a solid educational foundation, such as a BA degree in data science-related fields. Nonetheless, it’s not all about the degree because real-world skills matter a lot. Aspiring database administrators should learn database languages, with SQL being the key player. They should also get their hands dirty with popular database systems like Oracle and Microsoft SQL Server.

#### Average Salary

Database administrators earn an average salary of \$77,391 annually.

### Data Engineer

Successful data engineers construct and maintain the infrastructure that allows the data to flow seamlessly. Besides understanding data ecosystems on the day-to-day, they build and oversee the pipelines that gather data from various sources so as to make data more accessible for those who need to analyze it (e.g., data analysts).

#### General Requirements

Data engineering is a role that demands not just technical expertise in tools like SQL, Python, and Hadoop but also a creative problem-solving approach to tackle the complex challenges of managing massive amounts of data efficiently.

Usually, employers look for credentials like university degrees or advanced data science courses and bootcamps.

#### Average Salary

Data engineers earn a whooping average salary of \$125,180 per year.

### Database Architect

A database architect’s main responsibility involves designing the entire blueprint of a data management system, much like an architect who sketches the plan for a building. They lay down the groundwork for an efficient and scalable data infrastructure.

Their day-to-day work is a fascinating mix of big-picture thinking and intricate detail management. They decide how to store, consume, integrate, and manage data by different business systems.

#### General Requirements

If you’re aiming to excel as a database architect but don’t necessarily want to pursue a degree, you could start honing your technical skills. Become proficient in database systems like MySQL or Oracle, and learn data modeling tools like ERwin. Don’t forget programming languages – SQL, Python, or Java.

If you want to take it one step further, pursue a credential like the Certified Data Management Professional (CDMP) or the Data Science Bootcamp by Springboard.

#### Average Salary

Data architecture is a very lucrative career. A database architect can earn an average of \$165,383 per year.

### Machine Learning Engineer

A machine learning engineer experiments with various machine learning models and algorithms, fine-tuning them for specific tasks like image recognition, natural language processing, or predictive analytics. Machine learning engineers also collaborate closely with data scientists and analysts to understand the requirements and limitations of data and translate these insights into solutions.

#### General Requirements

As a rule of thumb, machine learning engineers must be proficient in programming languages like Python or Java, and be familiar with machine learning frameworks like TensorFlow or PyTorch. To successfully pursue this career, you can either choose to undergo a degree or enroll in courses and follow a self-study approach.

#### Average Salary

Depending heavily on the company’s size, machine learning engineers can earn between \$125K and \$187K per year, one of the highest-paying AI careers.

### Quantitative Analyst

Qualitative analysts are essential for financial institutions, where they apply mathematical and statistical methods to analyze financial markets and assess risks. They are the brains behind complex models that predict market trends, evaluate investment strategies, and assist in making informed financial decisions.

They often deal with derivatives pricing, algorithmic trading, and risk management strategies, requiring a deep understanding of both finance and mathematics.

#### General Requirements

This data science role demands strong analytical skills, proficiency in mathematics and statistics, and a good grasp of financial theory. It always helps if you come from a finance-related background.

#### Average Salary

A quantitative analyst earns an average of \$173,307 per year.

### Data Mining Specialist

A data mining specialist uses their statistics and machine learning expertise to reveal patterns and insights that can solve problems. They swift through huge amounts of data, applying algorithms and data mining techniques to identify correlations and anomalies. In addition to these, data mining specialists are also essential for organizations to predict future trends and behaviors.

#### General Requirements

If you want to land a career in data mining, you should possess a degree or have a solid background in computer science, statistics, or a related field.

#### Average Salary

Data mining specialists earn \$109,023 per year.

### Data Visualisation Engineer

Data visualisation engineers specialize in transforming data into visually appealing graphical representations, much like a data storyteller. A big part of their day involves working with data analysts and business teams to understand the data’s context.

#### General Requirements

Data visualization engineers need a strong foundation in data analysis and be proficient in programming languages often used in data visualization, such as JavaScript, Python, or R. A valuable addition to their already-existing experience is a bit of expertise in design principles to allow them to create visualizations.

#### Average Salary

The average annual pay of a data visualization engineer is \$103,031.

## Resources To Find Data Science Jobs

The key to finding a good data science job is knowing where to look without procrastinating. To make sure you leverage the right platforms, read on.

### Job Boards

When hunting for data science jobs, both niche job boards and general ones can be treasure troves of opportunity.

Niche boards are created specifically for data science and related fields, offering listings that cut through the noise of broader job markets. Meanwhile, general job boards can have hidden gems and opportunities.

### Online Communities

Spend time on platforms like Slack, Discord, GitHub, or IndieHackers, as they are a space to share knowledge, collaborate on projects, and find job openings posted by community members.

Don’t forget about socials like LinkedIn or Twitter. The LinkedIn Jobs section, in particular, is a useful resource, offering a wide range of opportunities and the ability to directly reach out to hiring managers or apply for positions. Just make sure not to apply through the “Easy Apply” options, as you’ll be competing with thousands of applicants who bring nothing unique to the table.

## FAQs about Data Science Careers

### Do I Need A Degree For Data Science?

A degree is not a set-in-stone requirement to become a data scientist. It’s true many data scientists hold a BA’s or MA’s degree, but these just provide foundational knowledge. It’s up to you to pursue further education through courses or bootcamps or work on projects that enhance your expertise. What matters most is your ability to demonstrate proficiency in data science concepts and tools.

### Does Data Science Need Coding?

Yes. Coding is essential for data manipulation and analysis, especially knowledge of programming languages like Python and R.

### Is Data Science A Lot Of Math?

It depends on the career you want to pursue. Data science involves quite a lot of math, particularly in areas like statistics, probability, and linear algebra.

### What Skills Do You Need To Land an Entry-Level Data Science Position?

To land an entry-level job in data science, you should be proficient in several areas. As mentioned above, knowledge of programming languages is essential, and you should also have a good understanding of statistical analysis and machine learning. Soft skills are equally valuable, so make sure you’re acing problem-solving, critical thinking, and effective communication.

Since you’re here…Are you interested in this career track? Investigate with our free guide to what a data professional actually does. When you’re ready to build a CV that will make hiring managers melt, join our Data Science Bootcamp which will help you land a job or your tuition back!