How to build maps in R
World Maps in R and rworldmap
Oftentimes, it can be interesting to see if there are different trends in data depending on where it comes from. When you’re examining a dataset, one of the natural points of segmentation is to look at different countries. You might, for example, look at economic figures nation-by-nation or examine differing crime rates in that context, or look at census bureau statistics for differing countries.
An interactive world map is a useful way of quickly understanding the geographical elements of your data prior to committing yourself to larger and more complicated visualizations or further data analysis. A simple map can help sketch out data points and give them context
Thankfully, building maps in R is a seamless experience with the right approach. This particular tutorial doesn’t make use of the popular ggplot or ggplot2 data visualization libraries directly — instead, it’ll help you find new packages that can help you define new maps in R.
Related: How to Use R for Text Mining
In this tutorial, I will be using a Met Museum dataset sample in order to visualize which countries the artworks held in the museum’s collections hail from. As always, the code for this tutorial will be stored in a GitHub repo, along with all the data required to execute visualizations here.
A quick note on the colors used in this tutorial: for the sake of simplicity, I will be using the “heat” preset colors that come with the package itself. More advanced users should consider using RColorBrewer to modify and customize the palettes on these world maps.
Now make sure that R and the package are running properly by running this line of code:
Alright, now let’s bring in the data. I’ve included a 5000 row sample of the larger Met dataset in the repository.
met <- as.data.frame(read.csv("MetObjects_5k-sample.csv"))
Okay, now that the data has been ingested, we can proceed to create a frequency table of occurrences of country names. To do this, use R’s handy table() function — for posterity’s sake, I also added a line of code that will change the column names of the resulting data frame:
countries.met <- as.data.frame(table(met$Country)) colnames(countries.met) <- c("country", "value")
Now, we can finally move on to the excellent “rworldmap” package. The first thing that we will have to do is match the names of countries in our frequency table to those that are mapped on to the world maps in “rworldmap.” Luckily, this functionality is already embedded into the package thanks to the joinCountryData2Map() function. Let’s go ahead and run this function on our data so we can plot data on interactive maps with our data file:
matched <- joinCountryData2Map(countries.met, joinCode="NAME", nameJoinColumn="country")
As you can see, the first parameter is simply our frequency table data frame. The “joincode” parameter refers to the format of your country names (in this case, it is simply matching English country names to the data), and the “nameJoinColumn” refers to where these country names are stored inside our data frame.
After the joinCountryData2Map() runs, you should see an output of how many countries in our data frame were successfully matched to the package’s index of country names. Now, our “matched” object is the appropriate format for the “rworldmap” package to generate a map output. Let’s go ahead and do that:
mapCountryData(matched, nameColumnToPlot="value", mapTitle="Met Collection Country Sample", catMethod = "pretty", colourPalette = "heat")
And here is the resulting global heatmap of our values:
And here we are! This function outputs a frequency visualization of a 5000 row sample of the Met collection database. As we can see, Egypt is woefully overrepresented in our data — most likely due to the large Ancient Egyptian artifact collection that is held at the museum. Out of plain curiosity, I decided to also generate a world map that omits Egypt from the results. Here is the code:
countries.met.2 <- countries.met[-21,] matched.2 <- joinCountryData2Map(countries.met.2, joinCode="NAME", nameJoinColumn="country") mapCountryData(matched.2, nameColumnToPlot="value", mapTitle="Met Collection Country Sample", catMethod = "pretty", colourPalette = "heat")
And here is the result:
The United States and Iran are clear frontrunners, with Indonesia coming up in third place — an interesting distribution. Overall, I anticipated the data to be skewed towards Europe and the U.S.; this is very clearly not the case. As you can see, building maps in R can quickly help you distinguish geographic outliers in your data that you might otherwise have missed.
Lastly, I wanted to also quickly point out a useful feature of the “rworldmap” package: the ability to zoom in on a particular geographical zone. For instance — what if we wanted to visualize Europe more closely? For that, the “rworldmap” package has the “mapRegion” parameter which allows you to isolate the geographical region of the map that is the most interesting to you. Let’s take a look at Europe:
mapCountryData(matched.2, nameColumnToPlot="value", mapTitle="Met Collection in Eurasia", mapRegion="Eurasia", colourPalette="heat", catMethod="pretty")
And here is the result:
As you can see, with one quick change in the code, you can now visualize the distribution of artworks across Eurasia instead of around the entire world.
Additionally, R is currently one of the most popular languages in the domain of data science, having gained the favor of millions of data scientists worldwide. Using R and “rworldmap” provides an effective and easy way of translating country values in a dataset into a good-looking and customizable geographical visualization. It’s a handy way to create maps in R — and a great way for you to slice and dice through your data geographically.
Since you’re here…
Thinking about a career in data science? Enroll in our Data Science Bootcamp, and we’ll get you hired in 6 months. If you’re still playing the field, take a peek at our free data science curriculum, and don’t forget to peep our student reviews. The data’s on our side.