{"id":9931,"date":"2020-11-02T13:01:47","date_gmt":"2020-11-02T21:01:47","guid":{"rendered":"https:\/\/www.springboard.com\/?p=9931"},"modified":"2023-10-10T00:43:12","modified_gmt":"2023-10-10T07:43:12","slug":"beginners-guide-wrangling-data","status":"publish","type":"post","link":"https:\/\/www.springboard.com\/blog\/data-science\/beginners-guide-wrangling-data\/","title":{"rendered":"How to Wrangle Your First Data Set: A Beginner&#8217;s Guide"},"content":{"rendered":"\n<p><span style=\"font-weight: 400;\">Aspiring data specialists should always be on the lookout to get their hands dirty exploring different publicly available data sets. However, finding one to use for practicing a certain skill or tool can be confusing. <\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Knowing what to look for depending on which skill you want to practice is an integral first step that will set you up for success. This post will break down what to look for, what types of data sets are out there, and what makes one type of data set different from another when you practice your data science skills.<\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">4 Methods for Data Analysis: A Quick Overview<\/span><\/h3>\n\n\n\n<p><span style=\"font-weight: 400;\">Before we dive into the different types of datasets that are out there, it\u2019s important to define a few methods of data analysis that you\u2019ll come across in your day-to-day responsibilities. Different types of data sets provide different challenges to focus on. Here are a few to get you started.<\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\n<h4><b>Data Analysis<\/b><\/h4>\n<\/li>\n<\/ul>\n\n\n\n<p><span style=\"font-weight: 400;\">First and foremost, data analysis in and of itself is the usage of logical reasoning and statistical analysis with collected data in order to guide decision making or extract helpful conclusions. For successful and efficient data analysis, the best datasets available online use will be organized, thorough, and diverse. This introduces the opportunity for confident answers and interesting findings. More on this below. <\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\n<h4><b>Data Cleaning <\/b><\/h4>\n<\/li>\n<\/ul>\n\n\n\n<p><span style=\"font-weight: 400;\">Data cleaning is a process done <\/span><i>before<\/i><span style=\"font-weight: 400;\"> the analysis begins, and is an integral part of maintaining dataset integrity along with concise and focused analysis. The process requires identifying irrelevant and repeat data and understanding <\/span><span style=\"font-weight: 400;\">how to replace, improve, or delete<\/span><span style=\"font-weight: 400;\"> these records. When practicing data cleaning, look out for information-rich datasets that offer multiple filtering options of what data to use or not.<\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\n<h4><b>Data Visualization<\/b><\/h4>\n<\/li>\n<\/ul>\n\n\n\n<p><span style=\"font-weight: 400;\">Equally important to the data itself is the ability for analysts to communicate findings\u2013after all, these conclusions are what influence business decisions. When launching a new product or service, conclusions from data drive strategies around UI\/UX Design, pricing models, and growth marketing spend. Data visualization is an important part of making these kinds of choices and involves creating visual assets that can <\/span><a href=\"https:\/\/www.tableau.com\/learn\/articles\/data-visualization\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">represent any patterns or trends<\/span><\/a><span style=\"font-weight: 400;\"> such as charts, graphs, or maps. <\/span>Whether you&#8217;re a data analyst or a <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/what-does-a-data-scientist-do\/\" data-type=\"URL\" data-id=\"https:\/\/www.springboard.com\/blog\/data-science\/what-does-a-data-scientist-do\/\">data scientist<\/a>, having a strong understanding of data visualization is essential in making your goals visible to your team.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\n<h4><b>Machine Learning Analysis<\/b><\/h4>\n<\/li>\n<\/ul>\n\n\n\n<p><span style=\"font-weight: 400;\">Machine learning (a subset of <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/data-science-definition\/\">data science<\/a>)<\/span> <span style=\"font-weight: 400;\">is a key concept to working with data so that systems can gain the ability to improve themselves and learn in real-time. Datasets that are well-equipped for ML analysis will always have a large number of data points: this is because you\u2019ll need to make up a training data set to train your algorithm as well as a test set to evaluate the success. The <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/free-public-data-sets-data-science-project\/\" target=\"_blank\" rel=\"noreferrer noopener\">dataset<\/a> you choose should be carefully curated and diverse to ensure unique findings and the opportunity to extend the system\u2019s knowledge. The most successful ML data projects should be dynamic and long-term, as well as frequently updated.<\/span><\/p>\n\n\n<div class=\"bg-leaf-50 p-4 my-3\"><h4 class=\"fw-bold text-center\">Get To Know Other\tData Science Students<\/h4><div class=\"row row-cols-1 row-cols-lg-3\"><div class=\"col\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/corey-wade\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1680712086\/Corey_Wade_LinkedIn.jpg\" alt=\"Corey Wade\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">Corey Wade<\/p><p class=\"text-muted lh-1\">Founder And Director at Berkeley Coding Academy<\/p><\/div><div class=\"w-100 d-block d-md-none mt-3\"><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/corey-wade\">Read Story<\/a><\/p><\/div><\/div><div class=\"col d-none d-md-block\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/melanie-hanna\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1629203193\/Student%20Success\/Melanie_Hanna_125x125.png\" alt=\"Melanie Hanna\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">Melanie Hanna<\/p><p class=\"text-muted lh-1\">Data Scientist at Farmer's Fridge<\/p><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/melanie-hanna\">Read Story<\/a><\/p><\/div><\/div><div class=\"col d-none d-md-block\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/isabel-van-zijl\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1629203192\/Student%20Success\/Isabel_van_Zijl_125x125.png\" alt=\"Isabel Van Zijl\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">Isabel Van Zijl<\/p><p class=\"text-muted lh-1\">Lead Data Analyst at Kinship<\/p><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/isabel-van-zijl\">Read Story<\/a><\/p><\/div><\/div><\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">How to Find the Perfect Data Set in 5 Steps<\/span><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\n<h4><strong>Step 1: Choose your focus<\/strong><\/h4>\n<\/li>\n<\/ul>\n\n\n\n<p><span style=\"font-weight: 400;\">Before seeking out your next dataset, be sure to have <\/span><a href=\"https:\/\/datahero.com\/blog\/2013\/11\/20\/5-beginners-steps-to-investigating-your-dataset\/\" target=\"_blank\" rel=\"noreferrer noopener\"><span style=\"font-weight: 400;\">your \u201c<\/span><\/a><span style=\"font-weight: 400;\">why<\/span><a href=\"https:\/\/datahero.com\/blog\/2013\/11\/20\/5-beginners-steps-to-investigating-your-dataset\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">\u201d<\/span><\/a><span style=\"font-weight: 400;\"> top of mind. Think about the questions your team will be asking and the goals that you\u2019ll set, such as: <\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span style=\"font-weight: 400;\">Are you trying to figure out what time of day customers are most likely to make a conversion? Are you analyzing a record of daily active users on your site? <\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">Are you exploring the engagement trends of your team\u2019s app? <\/span><\/li>\n<\/ul>\n\n\n\n<p><span style=\"font-weight: 400;\">The goal of <\/span><span style=\"font-weight: 400;\">data analysis<\/span><span style=\"font-weight: 400;\"> is to pull out useful information from data and use it in decision making. Keep that goal at the center of your project to stay motivated.<\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\n<h4><strong>Step 2: Ensure you have the appropriate amount of data<\/strong><\/h4>\n<\/li>\n<\/ul>\n\n\n\n<p><span style=\"font-weight: 400;\">Whatever set you work with should be rich enough to leave room for thorough data analysis. This involves the process of systematically applying statistical techniques to condense and extract answers from data. Try to aim for at least a few thousand rows, and at least 20 \u2212 25 columns. On the other hand, your data set should never be too busy. If you\u2019re finding yourself getting bogged down with unnecessary information, consider cleaning your data before beginning your analysis. <\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\n<h4><strong>Step 3: Work with clean data<\/strong><\/h4>\n<\/li>\n<\/ul>\n\n\n\n<p><span style=\"font-weight: 400;\">Data cleaning involves fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset. In some cases, data cleaning will involve combing through your data to read and recognize any outliers that don\u2019t belong. You can practice data cleaning using software that uses algorithms or lookup tables to pinpoint any discrepancies<\/span> <span style=\"font-weight: 400;\">to correct issues, dedupe data, and prepare it for analysis.<\/span><\/p>\n\n\n\n<p><strong>Best data cleaning tools<\/strong><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">If you\u2019re looking to transform data from one format into another, <\/span><a href=\"http:\/\/openrefine.org\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">OpenRefine<\/span><\/a><span style=\"font-weight: 400;\"> is an open-source data cleaning tool that can accommodate a few hundred thousand rows of data and give you access to a host of editing tools. Another popular tool used in data cleaning is <\/span><a href=\"https:\/\/dataladder.com\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Data Ladder<\/span><\/a><span style=\"font-weight: 400;\">. <\/span><span style=\"font-weight: 400;\">R<\/span><span style=\"font-weight: 400;\">ated the fastest and most accurate solution on the market, this tool is helpful in <\/span><a href=\"https:\/\/em360tech.com\/data_management\/tech-features-featuredtech-news\/top-10-data-cleansing-solutions\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">standardizing and preparing your information<\/span><\/a><span style=\"font-weight: 400;\"> for other analytics strategies. Another benefit to Data Ladder is its ability to integrate with many other connectors you may be using in your business such as SAP, Salesforce, and more.<\/span><span style=\"font-weight: 400;\"> <\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\n<h4><strong>Step 4: Look for a diverse range of variables<\/strong><\/h4>\n<\/li>\n<\/ul>\n\n\n\n<p><span style=\"font-weight: 400;\">Your dataset should have a mix of both continuous and categorical variables. Categorical variables may be divided into groups such as race, sex, and educational level. Continuous variables involve any data that would be impossible to count, as they go on forever. Examples include age, weight, and temperature. If you start with a dataset that has few columns which appear to be neither categorical nor continuous, data cleaning is a necessary next step. Too wide of a range introduces the possibility for overgeneralized conclusions. <\/span><\/p>\n\n\n\n<p><b>Categorical Data Analysis <\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Analysis of categorical data generally will always involve the use of data tables. For example, suppose a <\/span><a href=\"http:\/\/www.stat.yale.edu\/Courses\/1997-98\/101\/catdat.htm\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">survey was conducted of a group of 20 individuals<\/span><\/a><span style=\"font-weight: 400;\">, who were asked to identify their hair and eye color. A table could represent their responses where hair color is represented on the Y-axis and eye color is represented on the X-axis. The totals in each category account for the individuals in each without the effect of the other variable. Conclusions within categorical data are often represented by percentages; for example, let\u2019s say of the 20 individuals 4 had red hair, this means 20% of the surveyors are redheads. <\/span><\/p>\n\n\n\n<p><b>Continuous Data Analysis <\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">With continuous data,<\/span><a href=\"https:\/\/blog.minitab.com\/blog\/understanding-statistics\/why-is-continuous-data-better-than-categorical-or-discrete-data\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\"> strong insights can be made with far fewer data points<\/span><\/a><span style=\"font-weight: 400;\">, as it\u2019s less expensive to gather and often yields a higher sensitivity in terms of how close to the target any conclusions will hit. A key advantage of continuous data is that <\/span><a href=\"https:\/\/blog.minitab.com\/blog\/understanding-statistics\/understanding-qualitative-quantitative-attribute-discrete-and-continuous-data-types\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">it can be divided into finer and finer levels<\/span><\/a><span style=\"font-weight: 400;\">, allowing for this high sensitivity. For example, you can measure your height on scales of increasing precision, from meters to millimeters and beyond. C<\/span><span style=\"font-weight: 400;\">ontinuous data can also be used in <\/span><a href=\"https:\/\/blog.minitab.com\/blog\/understanding-statistics\/what-statistical-hypothesis-test-should-i-use\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">hypothesis testing<\/span><\/a><span style=\"font-weight: 400;\"> to predict accuracy with sample t-tests. <\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\n<h4><strong>Step 5: Avoid busy data sets<\/strong><\/h4>\n<\/li>\n<\/ul>\n\n\n\n<p><span style=\"font-weight: 400;\">Finding a balance between sparse and excessive comes down to overfitting and focus. If a dataset has too many variables, any reasoning pulled from it will be hard to <\/span><a href=\"https:\/\/www.researchgate.net\/post\/How_much_data_is_too_much_data\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">connect with reality<\/span><\/a><span style=\"font-weight: 400;\">. If the dataset\u2019s range is too wide, correlations will not be specific enough and therefore inapplicable. It all boils down to the strength of acceptability. If more data weakens the strength of your argument, examine your assumptions about what is relevant. A major indicator of a busy dataset is within pattern recognition. Choosing too many factors will lead to poor results, so focus on the quality and reliability of data to recognize when the set may have too much data.<\/span><\/p>\n\n\n\n<p><em>Related Read: <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/data-wrangling\/\" target=\"_blank\" rel=\"noreferrer noopener\">Understanding Data Wrangling + How (and When) It\u2019s Used<\/a><\/em><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">3 Different Types of Datasets and Their Benefits<\/span><\/h3>\n\n\n\n<p><span style=\"font-weight: 400;\">Different datasets have different requirements to work with them properly. For example, while graph-based datasets will need Tableau to create charts and visual assets, more temporal datasets will require <\/span><a href=\"https:\/\/desktop.arcgis.com\/en\/arcmap\/10.3\/map\/time\/a-quick-tour-of-temporal-data-management-and-visualization.htm\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">geoprocessing tools<\/span><\/a><span style=\"font-weight: 400;\"> to manage and organize the spatial data. <\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Additionally, you\u2019ll frequently use SQL for interaction with multiple databases at once as well as relational databases. Having a solid grasp of the language will make it much easier to communicate your findings through intuitive dashboards that can serve as <\/span><span style=\"font-weight: 400;\">an <\/span><a href=\"https:\/\/www.sisense.com\/glossary\/sql-for-data-analysis\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">intermediary between end-users and a more complex data storage system<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Being able to recognize distinguishing factors between datasets will sharpen your approach and lead to more accurate predictions and insights. <\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\n<h4><b>Record Datasets<\/b><\/h4>\n<\/li>\n<\/ul>\n\n\n\n<p><span style=\"font-weight: 400;\">Record data has no explicit relationship among records or data fields. Each object has the same set of attributes stored in flat files or relational databases. These types of databases are the most common, as data mining work typically assumes data is a collection of data objects. Record datasets such as the <\/span><a href=\"https:\/\/www.census.gov\/data.html\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">United States Census Data<\/span><\/a><span style=\"font-weight: 400;\"> or <\/span><a href=\"https:\/\/tech.instacart.com\/3-million-instacart-orders-open-sourced-d40d29ead6f2\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">this dataset from Instacart<\/span><\/a><span style=\"font-weight: 400;\"> are best in terms of accessibility and simplicity\u2014they\u2019re easy to read and well-suited for practice for beginners.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Record datasets also provide a great platform to use more advanced techniques like Machine Learning to learn about the dataset and be able to predict and improve the program. Since record datasets are so straightforward, they\u2019re great for using AI to sort through patterns and trends and form predictions. It\u2019s important to note that other data like images, videos, and unstructured data can not be used in machine learning. <\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">When it comes to ML, it\u2019s important to make sure the dataset you use is high-quality, accurate, and frequently updated, as <\/span><span style=\"font-weight: 400;\">machine learning algorithms function through <\/span><a href=\"https:\/\/machinelearningmastery.com\/data-learning-and-modeling\/#:~:text=A%20machine%20learning%20method%20can%20have%20a%20high%20or%20a,accuracy%20as%20the%20models%20performance.\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">a process called inductive learning<\/span><\/a><span style=\"font-weight: 400;\">. This process forms models from training data in order to form generalizations and predictive analysis that can be applied to your dataset. Knowing how to use ML in data analysis is an extremely important skill today, as predictive analysis can have major real-world impacts\u2014as evidenced in <\/span><a href=\"https:\/\/covid19-projections.com\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">these COVID-19 projections<\/span><\/a><span style=\"font-weight: 400;\">. <\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\n<h4><b>Graph-Based Datasets<\/b><\/h4>\n<\/li>\n<\/ul>\n\n\n\n<p><span style=\"font-weight: 400;\">A graph-based dataset (GBD) uses structures and semantic queries to display and sort data. Data object relationships are found in the<\/span><span style=\"font-weight: 400;\"><a href=\"https:\/\/towardsdatascience.com\/types-of-data-sets-in-data-science-data-mining-machine-learning-eb47c80af7a\" target=\"_blank\" rel=\"noopener\"> links between objects and link properties<\/a>.<\/span> F<span style=\"font-weight: 400;\">or example, direction or weight. The structured nature of graph-based datasets allows the occurrence of sub-object relationships, which can be represented as graphs as well. The key benefit of using GBDs is that they lend themselves to an easy transition into data visualization. This type of dataset is a step up from a record dataset, for example, as it introduces an additional level of organization. <\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Graph-based datasets can be largely helpful in communicating data at large\u2014and is especially effective in analyzing data that\u2019s continuously being gathered. For example, <\/span><a href=\"https:\/\/www.statista.com\/statistics\/1118047\/2020-presidential-election-latest-polls-us\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">this data<\/span><\/a><span style=\"font-weight: 400;\"> uses graphs to represent the polling averages of the 2020 U.S. election. Similarly, <\/span><a href=\"http:\/\/www.favstats.eu\/post\/demdebates\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">this data set<\/span><\/a><span style=\"font-weight: 400;\"> explores the primary debates to track common word combinations, who got the most applause and more all represented through different graphical analyses. <\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\n<h4><b>Ordered Datasets<\/b><\/h4>\n<\/li>\n<\/ul>\n\n\n\n<p><span style=\"font-weight: 400;\">Ordered data sets require a user-specified key for organization. They are kept in a <\/span><a href=\"https:\/\/public.support.unisys.com\/aseries\/docs\/ClearPath-MCP-18.0\/86000213-420\/section-000019403.html\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">physical sequence based on this chosen key<\/span><\/a><span style=\"font-weight: 400;\"> and don\u2019t require using a set. Ordered data sets such as <\/span><a href=\"https:\/\/ucr.fbi.gov\/crime-in-the-u.s\/2016\/crime-in-the-u.s.-2016\/topic-pages\/tables\/table-1\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">this FBI public data set<\/span><\/a><span style=\"font-weight: 400;\"> can be split according to the type of data, such as: <\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><b>Temporal data<\/b><span style=\"font-weight: 400;\"> is an extension of record data, where each object has a time connection. For example, can be used to track crime patterns by time of day. <\/span><\/li>\n\n\n\n<li><b>Sequence data<\/b><span style=\"font-weight: 400;\"> includes a sequence of individual entities. It\u2019s similar to temporal data, but involves letters or numbers instead of time. (An everyday example of sequential data are gene sequences.)<\/span><\/li>\n\n\n\n<li><b>Time-series datasets<\/b><span style=\"font-weight: 400;\"> blend the first two by involving a record as a series over time. You can find an example of a public ordered dataset that uses time-series data in the <\/span><a href=\"http:\/\/archive.ics.uci.edu\/ml\/datasets\/Dow+Jones+Index\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">weekly returns of the Dow Jones Index<\/span><\/a><span style=\"font-weight: 400;\">. Another real-world application of sequential data can be found in <\/span><a href=\"https:\/\/www.bloomberg.com\/news\/articles\/2020-04-21\/america-needs-real-time-data-to-get-through-coronavirus-crisis\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">this Bloomberg article<\/span><\/a><span style=\"font-weight: 400;\"> that compares financial statistics with COVID-19 data. Within the article, there is an analysis of GDP and reported cases and how the numbers have fluctuated over time. <\/span><\/li>\n\n\n\n<li><b>Spatial data<\/b><span style=\"font-weight: 400;\"> includes objects that have spatial attributes such as locations or areas. The biggest benefit of choosing an ordered dataset is the opportunity for easy-to-find, real-world applications, such as weather patterns in your city. <\/span><\/li>\n<\/ul>\n\n\n\n<p class=\"rm has-background\" style=\"background-color:#efeff6\"><strong>Since you\u2019re here\u2026<br><\/strong>Curious about a career in data science? Experiment with our <a rel=\"noreferrer noopener\" href=\"https:\/\/www.springboard.com\/resources\/guides\/data-science-process\/\" target=\"_blank\">free data science learning path<\/a>, or join our <a rel=\"noreferrer noopener\" href=\"https:\/\/www.springboard.com\/courses\/data-science-career-track\/\" target=\"_blank\">Data Science Bootcamp<\/a>, where you\u2019ll get your tuition back if you don&#8217;t land a job after graduating. We\u2019re confident because our courses work \u2013 check out our <a rel=\"noreferrer noopener\" href=\"https:\/\/www.springboard.com\/success\/\" target=\"_blank\">student success stories<\/a> to get inspired.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Aspiring data specialists should always be on the lookout to get their hands dirty exploring different publicly available data sets. However, finding one to use for practicing a certain skill or tool can be confusing. Knowing what to look for depending on which skill you want to practice is an integral first step that will [&hellip;]<\/p>\n","protected":false},"author":89,"featured_media":9938,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_eb_attr":"","_eb_data_table":"","footnotes":""},"categories":[67],"tags":[],"marketing_tags":[],"class_list":{"0":"post-9931","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-data-science"},"acf":[],"_links":{"self":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/9931"}],"collection":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/users\/89"}],"replies":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/comments?post=9931"}],"version-history":[{"count":4,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/9931\/revisions"}],"predecessor-version":[{"id":50328,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/9931\/revisions\/50328"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/media\/9938"}],"wp:attachment":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/media?parent=9931"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/categories?post=9931"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/tags?post=9931"},{"taxonomy":"marketing_tags","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/marketing_tags?post=9931"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}