Drawing Canada Maps in R

2020/1/7 12:00 AM

There are many reasons you may want to draw a custom map with ggplot in R — overlaying data with ggplot is a lot more flexible than in Power BI1. There are many blog posts on how to do this for the US, but resources regarding Canada are scarce. But fear not! With the help from StatCan, some great pointers out there2, and the spirit of exploration, we are ready to share how to draw some Canada maps.

By the end of the post, you will be able to —

  • Use R and ggplot to draw maps for Canada: all of the 10 provinces and the 3 territories
  • Zoom in to draw only select provinces or select areas within a province
  • Apply coordinate transformation so that the maps of the Great White North appear closer to reality
  • Overlay custom data points with coordinates — we are pinning Vancouver, Calgary, Edmonton, Toronto, Ottawa, and Montréal.

Canada Maps drawn in R

You will be using —

  • Shape files from StatCan (.shp, shx, .dbf)
  • R with multiple shape file-related packages listed below
  • SQL connected to SCDB to grab coordinates for the open big box stores

The R packages will be listed here and omitted in code blocks below. The full script and data can be downloaded here.

library(sf) # the base package manipulating shapes
library(rgdal) # geo data abstraction library
library(geojsonio) # geo json input and output
library(spdplyr) # the `dplyr` counterpart for shapes
library(rmapshaper) # the package that allows geo shape transformation
library(magrittr) # data wrangling
library(dplyr)
library(tidyr)
library(ggplot2)

Getting the Data

StatCan website has a comprehensive list of shape file data available. For the purpose of this exercise, we are using ArcGIS format, Census divisions, Cartographic Boundary File. Download the zip file — it should be around 60 MB and slightly larger unzipped.

After unzipping, you will be seeing four files:

  • .shp file is where the data live in
  • .shx is the index for the main data file
  • .dbf describes the shapes in the main data file
  • .prj is the project file for when the shape file was created

Note: You MUST name the .shp, .shx, and .dbf files with the same name (different extensions) and place them in the same folder. Existence of the .prj file is optional.

Transforming and Simplifying the Data

Shape files must be transformed to Geo JSON format to be consumed and utilized by R. And since the shape file is really high-definition, we need to simplify the file so that plotting the data isn’t a painful 20-minute wait.

This code block only needs to be run once — once you have the simplified Geo JSON file stored in disk, you can read data directly from there.

For the R code below, we are assuming the data have been placed in the /data folder in the working directory.

canada_raw = readOGR(dsn = "data", layer = "gcd_000b11a_e", encoding = 'latin1') # 1
canada_raw_json <- geojson_json(canada_raw) # 2
canada_raw_sim <- ms_simplify(canada_raw_json) # 3
geojson_write(canada_raw_sim, file = "data/canada_cd_sim.geojson") # 4

Notes:

  1. We are reading the shape file (files) — note the lack of extension and the latin1 encoding. You shouldn’t be using UTF-8 here for this file downloaded from StatCan.

  2. Converts shape file into Geo JSON format — should take a few minutes.

  3. Simplifying the Geo JSON data, keeping 5% of the most prominent vertices of the geo polygons by default. This should also take a few minutes.

  4. Writing the simplified Geo JSON file back to disk. From here on out, we no longer use the large original shape files.

Plotting the Great White North

Assuming you’ve done the transformation part above, and have a Geo JSON data placed in the /data folder in the working directory, you are ready to draw the maps of Canada.

canada_cd <- st_read("data/canada_cd_sim.geojson", quiet = TRUE) # 1
crs_string = "+proj=lcc +lat_1=49 +lat_2=77 +lon_0=-91.52 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs" # 2

# Define the maps' theme -- remove axes, ticks, borders, legends, etc.
theme_map <- function(base_size=9, base_family="") { # 3
	require(grid)
	theme_bw(base_size=base_size, base_family=base_family) %+replace%
		theme(axis.line=element_blank(),
			  axis.text=element_blank(),
			  axis.ticks=element_blank(),
			  axis.title=element_blank(),
			  panel.background=element_blank(),
			  panel.border=element_blank(),
			  panel.grid=element_blank(),
			  panel.spacing=unit(0, "lines"),
			  plot.background=element_blank(),
			  legend.justification = c(0,0),
			  legend.position = c(0,0)
		)
}
# Define the filling colors for each province; max allowed is 9 but good enough for the 13 provinces + territories
map_colors <- RColorBrewer::brewer.pal(9, "Pastel1") %>% rep(37) # 4

# Plot the maps
ggplot() +
	geom_sf(aes(fill = PRUID), color = "gray60", size = 0.1, data = canada_cd) + # 5
	coord_sf(crs = crs_string) + # 6
	scale_fill_manual(values = map_colors) +
	guides(fill = FALSE) +
	theme_map() +
	theme(panel.grid.major = element_line(color = "white"),
		  legend.key = element_rect(color = "gray40", size = 0.1))

Notes:

  1. We are reading the Geo JSON file from disk.

  2. We are defining the descriptions for Coordinate Reference System (CRS). For Canada, you can take this string and run without asking why, but if you’ve gotta ask, here is a footnote.3

  3. We are creating a ggplot theme that basically removes axes, borders, legends, and use white backgrounds.

  4. We are gonna colour each province and territory with their own colour. But since there are 13 provinces and territories and Color Brewer only supports 9, we are not killing ourselves coming up with distinct colours — 9 colours for 13 areas are good enough.

  5. We pass in the shape object as data source, and fill the polygons based on PRUID, or province unique identifier.

  6. We apply CRS transformation. The transformation is applied after all the shape layers are drawn (in this case, there’s but one layer). Refer to Note 2 to learn more about the CRS transformation and the why behind it.

You should now be able to plot some pretty Canada Maps like this:

Canada Maps

Now let’s add more fun to it by also plotting our big box stores across the country!

Overlaying Custom Data

What fun is it with maps if we can’t plot our data points on them? Let’s do just that.

Getting the coordinates of the cities

We are using hardcoded city coordinates, read as a data frame with read.delim:

conn = textConnection("cma	lat	long
Vancouver	49.2827	-123.1207
Calgary	51.0447	-114.0719
Edmonton	53.5461	-113.4938
Toronto	43.6532	-79.3832
Ottawa	45.4215	-75.6972
Montreal	45.5017	-73.5673")
city_coords = read.delim(conn, stringsAsFactors = F)

By now you should have city_coords as a data frame of city names, latitudes and longitudes.

Conforming the data frame to shape objects

To actually plot the coordinates, the last step is transformation. Remember what we’ve done with the Canada maps? We applied CRS transformation to better represent the geometries. We need to do the same to the cities’ coordinates.

sf_cities = city_coords %>%
	select(long, lat) %>% # 1
	as.matrix() %>% # 2
	st_multipoint(dim = 'XY') %>% # 3
	st_sfc() %>% # 4
	st_set_crs(4269) # 5

A few things to unpack here:

  1. We use dplyr to filter for only longitude and latitude columns of the data frame. We are also using %>% to pipe the operations — to learn more, check out magrittr package.

  2. We convert it to a 6-by-2 matrix, as this is the only data type the next step’s st_multipoint accepts.

  3. We convert the matrix to a multipoint geometry, consisting of 6 points on the X-Y plane.

  4. We convert the geometry to a shape object.

  5. We decorate the geometry with CRS code “4269”, which is standard for North America, and is used in StatCan’s data files.

And now we are finally ready for the fun.

Plotting for real

ggplot() +
	geom_sf(aes(fill = PRUID), color = "gray60", size = 0.1, data = canada_cd) +
	geom_sf(data = sf_cities, color = '#001e73', alpha = 0.5, size = 3) + # 17
	coord_sf(crs = crs_string) +
	scale_fill_manual(values = map_colors) +
	guides(fill = FALSE) +
	theme_map() +
	theme(panel.grid.major = element_line(color = "white"),
		  legend.key = element_rect(color = "gray40", size = 0.1))

Canada Maps drawn in R

Voila! Canada maps right there, all created in R.

What’s Next?

Consider creating your own package to draw the Canada maps that allow you to generate part of Canada easily. For my daytime work, we focus heavily on the major metropolitan areas such as Greater Vancouver, Greater Toronto, and Ottawa-Gatineau. A package that allows you to conjure shapes with one line of code will prove very handy.

  1. We use Microsoft Power BI across the company. 

  2. Canada Maps - R Bloggers was my starting point. The post neatly points which packages to look into, and where to get the data. However, it falls short explaining how to draw your own data that are now part of a shape file. A lot of code in this post is taken from this post with only minor modifications. 

  3. it’s because the official Canada maps are drawn with Lambert Conformal Conic projection (hence the “lcc” in the string), with the central meridian at , and standard parallels at and . The full discussion is on StatCan as well.

    If you do not apply transformation, the maps will look more like this:

    Canada Maps without CRS transformation

    The northern territories will have overstated areas, and look how distorted Nunavut appears! 

So Technical It’s in English
文章列表 归档