{"id":2449,"date":"2022-06-29T10:50:53","date_gmt":"2022-06-29T17:50:53","guid":{"rendered":"https:\/\/www.springboard.com\/?p=2449"},"modified":"2023-07-03T20:41:40","modified_gmt":"2023-07-04T03:41:40","slug":"data-wrangling","status":"publish","type":"post","link":"https:\/\/www.springboard.com\/blog\/data-science\/data-wrangling\/","title":{"rendered":"Understanding Data Wrangling + How (and When) It\u2019s Used"},"content":{"rendered":"\n<p>Data wrangling is a process used often by data analysts when they begin working with new sets of ra\u2019 or \u2018Deutschland\u2019 only returns entries matching that text striw data. You may have heard the term before, or you may have heard it referred to as data munging. In the simplest terms, to wrangle data is to organize and standardize its format so it can be analyzed by software data processing.<\/p>\n\n\n\n<p>It\u2019s an important task in the day of a data scientist, so learning about data wrangling is crucial if you want to enter a career in data science or data analytics. In fact, the Anaconda State of Data Science 2020 report found that data scientists spend around 26% of their time wrangling data.&nbsp;<\/p>\n\n\n\n<p>This guide will fill you in on what data wrangling is, what its benefits are, how it works, and when you should use it\u2014everything you need to start wrangling some data yourself.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What Is Data Wrangling?<\/h2>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-is-an-iterative-data-exploration-and-transformation-process-which-aims-to.png\" alt=\"What is data wrangling\" class=\"wp-image-26176\" width=\"638\" height=\"374\" srcset=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-is-an-iterative-data-exploration-and-transformation-process-which-aims-to.png 850w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-is-an-iterative-data-exploration-and-transformation-process-which-aims-to-380x223.png 380w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-is-an-iterative-data-exploration-and-transformation-process-which-aims-to-380x223.png 420w\" sizes=\"(max-width: 638px) 100vw, 638px\" \/><figcaption class=\"wp-element-caption\">Source &#8211; <a href=\"https:\/\/www.researchgate.net\/figure\/Data-wrangling-is-an-iterative-data-exploration-and-transformation-process-which-aims-to_fig1_319560096\" target=\"_blank\" rel=\"noreferrer noopener\">Research Gate<\/a><\/figcaption><\/figure>\n\n\n\n<p>When <a href=\"https:\/\/www.springboard.com\/blog\/data-analytics\/what-does-data-analyst-do\/\" target=\"_blank\" rel=\"noreferrer noopener\">data analysts<\/a> and <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/what-does-a-data-scientist-do\/\" target=\"_blank\" rel=\"noreferrer noopener\">data scientists<\/a> analyze data, they code models that will read through the data and return entries that match their requirements.&nbsp;<\/p>\n\n\n\n<p>However, computers don\u2019t read information in the way that we do. To a computer, the words Germany, DE, and Deutschland are simply different text strings with no apparent relation to each other. To humans, however, each of these words refers to the same country.&nbsp;<\/p>\n\n\n\n<p>As an example, imagine that you have three different pieces of data about Germany attached to these entries. If you query a database for all data on Germany, you would only get back the one entry that exactly matches the text string \u201cGermany.\u201d<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1052\" height=\"515\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/sql-statement.jpg\" alt=\"Data Wrangling - SQL Statement \" class=\"wp-image-26174\" srcset=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/sql-statement.jpg 1052w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/sql-statement-380x186.jpg 380w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/sql-statement-380x186.jpg 420w\" sizes=\"(max-width: 1052px) 100vw, 1052px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1200\" height=\"729\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-querying-for-germany-1200x729.png\" alt=\"Data Wrangling, Querying for Germany\" class=\"wp-image-47122\" srcset=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-querying-for-germany-1200x729.png 1200w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-querying-for-germany-400x243.png 400w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-querying-for-germany-768x467.png 768w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-querying-for-germany-380x231.png 380w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-querying-for-germany-700x425.png 700w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-querying-for-germany.png 1262w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-querying-for-germany-380x231.png 420w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" \/><\/figure>\n\n\n\n<p>The above two screenshots show that querying for \u2018Germany\u2019 or \u2018Deutschland\u2019 only returns entries matching that text string, rather than all entries matching the country of Germany.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1200\" height=\"837\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sql-statement-1200x837.png\" alt=\"Data Wrangling, SQL Statement\" class=\"wp-image-47123\" srcset=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sql-statement-1200x837.png 1200w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sql-statement-400x279.png 400w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sql-statement-768x535.png 768w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sql-statement-380x265.png 380w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sql-statement-700x488.png 700w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sql-statement.png 1258w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sql-statement-380x265.png 420w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" \/><\/figure>\n\n\n\n<p>Here, you can see that in order to return all applicable entries, the query would have to include every name format used within the data set. This would be annoying with smaller data sets and impossible with large ones.<\/p>\n\n\n\n<p>This is one kind of problem data wrangling takes care of. When you wrangle data, you standardize its format so the algorithms can read it and then return the information you want.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What Are the Goals?<\/h3>\n\n\n\n<p>The goal of data wrangling is to first take data in its raw form that has come from different sources and in different formats. The next step is to organize, clean, and standardize it. This means deleting duplicates, standardizing date formats and abbreviations, and checking for errors so the data can be used for analysis.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What Are the Benefits?<\/h3>\n\n\n\n<p>Data wrangling is a necessary step in the <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/data-science-process\/\" target=\"_blank\" rel=\"noreferrer noopener\">data science process<\/a> that enables you to conduct that analysis. If your data is non-standardized and fraught with errors and duplicates, any query you make of it will return incomplete and incorrect sets of data that would produce invalid outcomes when analyzed.&nbsp;<\/p>\n\n\n<div class=\"bg-leaf-50 p-4 my-3\"><h4 class=\"fw-bold text-center\">Get To Know Other\tData Science Students<\/h4><div class=\"row row-cols-1 row-cols-lg-3\"><div class=\"col\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/haotian-wu\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1629203192\/Student%20Success\/Haotian_Wu_125x125.png\" alt=\"Haotian Wu\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">Haotian Wu<\/p><p class=\"text-muted lh-1\">Data Scientist at RepTrak<\/p><\/div><div class=\"w-100 d-block d-md-none mt-3\"><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/haotian-wu\">Read Story<\/a><\/p><\/div><\/div><div class=\"col d-none d-md-block\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/jonathan-orr\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1629203194\/Student%20Success\/Jonathan_Orr_125x125.png\" alt=\"Jonathan Orr\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">Jonathan Orr<\/p><p class=\"text-muted lh-1\">Data Scientist at Carlisle & Company<\/p><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/jonathan-orr\">Read Story<\/a><\/p><\/div><\/div><div class=\"col d-none d-md-block\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/melanie-hanna\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1629203193\/Student%20Success\/Melanie_Hanna_125x125.png\" alt=\"Melanie Hanna\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">Melanie Hanna<\/p><p class=\"text-muted lh-1\">Data Scientist at Farmer's Fridge<\/p><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/melanie-hanna\">Read Story<\/a><\/p><\/div><\/div><\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">What Are the Challenges?<\/h3>\n\n\n\n<p>The main challenges with data wrangling are the time it takes and the limited amount of work that can be automated. While many Python libraries exist that can help streamline the wrangling process for common problems (i.e. standardizing date formats), the reality is that there will always be bespoke problems that require manual correction.&nbsp;<\/p>\n\n\n\n<p>Only a human can comprehend the semantic meaning of a non-standard format and change it into standard syntax that software can organize.&nbsp;&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How Does Data Wrangling Work?<\/h2>\n\n\n\n<p>Next, we\u2019ll go into the details of data wrangling and how it actually achieves the goals and benefits outlined above.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What Tools Should Be Used in Data Wrangling?<\/h3>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1500\" height=\"840\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/tools-for-data-wrangline.png\" alt=\"Tools needed for Data Wrangling\" class=\"wp-image-26178\" srcset=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/tools-for-data-wrangline.png 1500w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/tools-for-data-wrangline-380x213.png 380w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/tools-for-data-wrangline-380x213.png 420w\" sizes=\"(max-width: 1500px) 100vw, 1500px\" \/><figcaption class=\"wp-element-caption\">Source &#8211; <a href=\"https:\/\/www.zoho.com\/dataprep\/what-is-data-wrangling.html\" target=\"_blank\" rel=\"noreferrer noopener\">Zoho<\/a><\/figcaption><\/figure>\n\n\n\n<p>In terms of wrangling tools, data analysts write scripts and use scripting libraries to wrangle their data. <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/python-guide\/\" target=\"_blank\" rel=\"noreferrer noopener\">Python<\/a> is a popular example of a scripting language that is used for data wrangling, and writing <a href=\"https:\/\/www.springboard.com\/blog\/software-engineering\/data-structures-and-algorithms-in-python\/\" target=\"_blank\" rel=\"noreferrer noopener\">data structures and algorithms<\/a>. It focuses on readability and has a large community that has created<a href=\"https:\/\/pypi.org\" target=\"_blank\" rel=\"noreferrer noopener\"> thousands of libraries<\/a> (or \u201cpackages\u201d) for data wrangling purposes.&nbsp;<\/p>\n\n\n\n<p>Packages are collections of pre-written modules that can automate processes a data analyst would otherwise have to complete manually. For example, when standardizing the country name format for a set of data, a data analyst would have to manually sift through the data to categorize it correctly.&nbsp;<\/p>\n\n\n\n<p>A library created for this purpose, however, would contain existing wrangling algorithms with datasets for dealing with common transformations of data. One example of this would be to create an algorithm that would transform the word \u201cGermany\u201d in every language into the text string \u201cGermany.\u201d<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What Is the Data Wrangling Process?<\/h3>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2048\" height=\"986\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-in-six-steps.png\" alt=\"Data Wrangling - Process \" class=\"wp-image-26177\" srcset=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-in-six-steps.png 2048w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-in-six-steps-380x183.png 380w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-in-six-steps-380x183.png 420w\" sizes=\"(max-width: 2048px) 100vw, 2048px\" \/><\/figure>\n\n\n\n<p>Data wrangling is a linear process that follows these steps:<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Discovery<\/h4>\n\n\n\n<p>The first step in wrangling data is establishing what information you want to gain from it and how you intend to use it. Depending on these conditions, the way data analysts will need to structure and format the data will change.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Structuring<\/h4>\n\n\n\n<p>Once a clear goal has been set out, data analysts will take the first steps to transform the raw data into something manageable. This will include standardizing the main entries so the data can be properly organized\u2014for example, standardizing country names and date formats.&nbsp;<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Cleaning<\/h4>\n\n\n\n<p>The process of cleaning data involves removing anything that would impede the data mining process later on. Errors, null entries, duplicate entries, and datasets that are not in the correct place will all be removed.&nbsp;<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Enriching<\/h4>\n\n\n\n<p>At this stage, data analysts will determine whether adding additional data would benefit their analysis. For instance, when analyzing data on European countries, it might help to know whether a country is a member of the European Union or not. Adding this information as a new column in the database will allow analysts to query based on those conditions when data mining, rather than having to list all the EU countries manually.&nbsp;<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Validating<\/h4>\n\n\n\n<p>Data is heavily manipulated during the wrangling process, so the validation step checks the quality of the outcome. Has any core data been accidentally changed? Has the standardization been fully applied so nothing is returned when you query for a format that you intended to remove? Have any errors gone unnoticed? Even small errors will affect the final outcomes of the analysis, so the quality check needs to be extensive and thorough.&nbsp;<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Publishing<\/h4>\n\n\n\n<p>When the data has reached an acceptable state, meaning that it\u2019s standardized, without errors, sorted, categorized, and primed to be mined for useful information, analysis can begin.&nbsp;<\/p>\n\n\n\n<p>Data analysts will mine and analyze the cleaned-up data according to the original purposes and goals outlined during the initial \u201cdiscovery\u201d step and will publish their results to be used by their clients.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Other Data Management Processes<\/h2>\n\n\n\n<p>Below is a useful list of other data management processes and how they differ from data wrangling. All of these processes are studied in both <a href=\"https:\/\/www.springboard.com\/courses\/data-analytics-career-track\/\" target=\"_blank\" rel=\"noreferrer noopener\">data analytics<\/a> and <a href=\"https:\/\/www.springboard.com\/courses\/data-science-career-track\/\" target=\"_blank\" rel=\"noreferrer noopener\">data science bootcamps<\/a>, so they will be useful no matter which field you\u2019re interested in. Of course, the two fields do have some similarities, so if you&#8217;re unsure, check out whether a <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/data-science-vs-data-analytics\/\" target=\"_blank\" rel=\"noreferrer noopener\">data analytics or data science career<\/a> is right for you.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Data Wrangling vs. Data Mining<\/h3>\n\n\n\n<p>Data mining is extracting useful patterns and information from data that has already been wrangled. The data is queried and studied so that you can properly understand the information it is giving to you. The correct conclusions can\u2019t always be discerned by simply looking at the numbers.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Data Wrangling vs. Data Cleaning<\/h3>\n\n\n\n<p>Data cleaning is the process of removing unwanted entries from your chunk of data. This includes duplicate entries, errors, and null (invalid) entries.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Data Wrangling vs. Data Munging<\/h3>\n\n\n\n<p>Data munging is another term for data wrangling, meaning the munging process refers to restructuring and cleaning complex data sets.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Data Wrangling vs. Data Analysis<\/h3>\n\n\n\n<p>Analytics processes are the final step after wrangling and mining the data. This is where the information and patterns that have been discerned from the (now clean and queryable) data are studied to determine how they affect and relate to the original objectives of the business users.&nbsp;<\/p>\n\n\n\n<p>You can also conduct <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/exploratory-data-analysis-python\/\" target=\"_blank\" rel=\"noreferrer noopener\">data analysis using Python<\/a>, or analyze the data through machine learning.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">When Should You Use Data Wrangling?<\/h2>\n\n\n\n<p>Data wrangling is used when you get data from different sources and need to make changes to it before being able to put it into a database and run queries. Here are some examples of when this would be necessary.&nbsp;&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Digitizing records. <\/strong>Different people will write dates and addresses and other information in different ways, so once digitized, the data will need to be standardized.&nbsp;<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Optical Character Recognition (OCR). <\/strong>This automated process is used when manually transferring data from paper would be too expensive. OCR can digitize the data automatically, but it will not be without mistakes that need to be wrangled.&nbsp;<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Collecting data from different countries. <\/strong>Different countries use different formats for data entry. For example, Denmark separates numbers with a period instead of a comma (35.000 = thirty-five thousand). Data from different sources like this needs to be standardized so it can all be queried together in one big database.&nbsp;<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Scraping information from websites. <\/strong>Information on websites is stored and presented in a format readable and comprehensible for humans, not databases. When <a href=\"https:\/\/www.springboard.com\/blog\/data-analytics\/web-scraping-basics\/\" target=\"_blank\" rel=\"noreferrer noopener\">scraping websites<\/a> for data, it will need to be wrangled into a format suitable for databases and querying.&nbsp;<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Data Wrangling Examples<\/h2>\n\n\n\n<p>In this section, we\u2019ll show you a few quick examples of data wrangling in action.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Structuring: standardizing a name format using Pandas<\/li>\n<\/ul>\n\n\n\n<p>Let\u2019s use the same data from the earlier screenshots and wrangle the country names to get rid of \u201cDeutschland.\u201d We will use Python and the well-known <a href=\"https:\/\/pypi.org\/project\/pandas\/\" target=\"_blank\" rel=\"noreferrer noopener\">Pandas package<\/a> to simplify this.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1022\" height=\"494\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sample.png\" alt=\"Data Wrangling, sample\" class=\"wp-image-47124\" srcset=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sample.png 1022w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sample-400x193.png 400w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sample-768x371.png 768w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sample-380x184.png 380w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sample-700x338.png 700w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sample-380x184.png 420w\" sizes=\"(max-width: 1022px) 100vw, 1022px\" \/><\/figure>\n\n\n\n<p>Here is our sample data in Python and converted into a pandas DataFrame. This is the output:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1380\" height=\"379\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sample.jpg\" alt=\"Data Wrangling - Example 2\" class=\"wp-image-26179\" srcset=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sample.jpg 1380w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sample-380x104.jpg 380w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sample-380x104.jpg 420w\" sizes=\"(max-width: 1380px) 100vw, 1380px\" \/><\/figure>\n\n\n\n<p>As you can see, Deutschland is present in a few of the data entries. Time to fix that!&nbsp;<\/p>\n\n\n\n<p>Pandas makes this remarkably simple. You can just add a line before you print it out, remapping \u201cDeutschland\u201d to \u201cGermany.\u201d<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1210\" height=\"379\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-dutch-to-german.jpg\" alt=\"Data Wrangling - Example 3\" class=\"wp-image-26180\" srcset=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-dutch-to-german.jpg 1210w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-dutch-to-german-380x119.jpg 380w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-dutch-to-german-380x119.jpg 420w\" sizes=\"(max-width: 1210px) 100vw, 1210px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1210\" height=\"349\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sample-1.jpg\" alt=\"Data Wrangling - Example 4\" class=\"wp-image-26181\" srcset=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sample-1.jpg 1210w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sample-1-380x110.jpg 380w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sample-1-380x110.jpg 420w\" sizes=\"(max-width: 1210px) 100vw, 1210px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cleaning: removing duplicates from a data set using Pandas<\/li>\n<\/ul>\n\n\n\n<p>Let\u2019s look at a smaller set of data that has a problem with duplication.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"837\" height=\"711\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sample-3.jpg\" alt=\"Data Wrangling - Example 5\" class=\"wp-image-26182\" srcset=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sample-3.jpg 837w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sample-3-380x323.jpg 380w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2022\/06\/data-wrangling-sample-3-380x323.jpg 420w\" sizes=\"(max-width: 837px) 100vw, 837px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Data Wrangling FAQs<\/h2>\n\n\n\n<p>Below, we&#8217;ll answer two of the most common questions regarding data wrangling.&nbsp;<\/p>\n\n\n<div id=\"rank-math-faq\" class=\"rank-math-block\">\n<div class=\"rank-math-list \">\n<div id=\"faq-question-1656435434958\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">Is Data Wrangling Worth It?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Yes. In many cases, you can\u2019t use data if you don\u2019t wrangle it first. If you mine and analyze un-wrangled data, the information could be anywhere from slightly to absurdly incorrect. Essentially, any parts of the data that weren\u2019t formatted appropriately would not have been used correctly in the analysis and would throw off all results.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1656435448217\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">When Shouldn\u2019t You Use Data Wrangling?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Data wrangling is used to manipulate data so it can be utilized by business users. To know whether you need to wrangle your data, you need to figure out what it is you want to do with it, and whether it is possible in the data\u2019s current state. If that\u2019s the case, then you don\u2019t need to wrangle the data.<\/p>\n\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n<p class=\"rm has-background\" style=\"background-color:#efeff6\"><strong>Since you\u2019re here\u2026<br><\/strong>Curious about a career in data science? Experiment with our <a rel=\"noreferrer noopener\" href=\"https:\/\/www.springboard.com\/resources\/guides\/data-science-process\/\" target=\"_blank\">free data science learning path<\/a>, or join our <a rel=\"noreferrer noopener\" href=\"https:\/\/www.springboard.com\/courses\/data-science-career-track\/\" target=\"_blank\">Data Science Bootcamp<\/a>, where you\u2019ll get your tuition back if you don&#8217;t land a job after graduating. We\u2019re confident because our courses work \u2013 check out our <a rel=\"noreferrer noopener\" href=\"https:\/\/www.springboard.com\/success\/\" target=\"_blank\">student success stories<\/a> to get inspired.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Data wrangling is a process used often by data analysts when they begin working with new sets of ra\u2019 or \u2018Deutschland\u2019 only returns entries matching that text striw data. You may have heard the term before, or you may have heard it referred to as data munging. In the simplest terms, to wrangle data is [&hellip;]<\/p>\n","protected":false},"author":123,"featured_media":26173,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_eb_attr":"","_eb_data_table":"","footnotes":""},"categories":[67],"tags":[],"marketing_tags":[],"class_list":{"0":"post-2449","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-data-science"},"acf":[],"_links":{"self":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/2449"}],"collection":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/users\/123"}],"replies":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/comments?post=2449"}],"version-history":[{"count":3,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/2449\/revisions"}],"predecessor-version":[{"id":47127,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/2449\/revisions\/47127"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/media\/26173"}],"wp:attachment":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/media?parent=2449"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/categories?post=2449"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/tags?post=2449"},{"taxonomy":"marketing_tags","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/marketing_tags?post=2449"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}