Download the vaccine coverage data from PHAC

We begin by downloading the weekly COVID-19 vaccine coverage dataset from the Public Health Agency of Canada (PHAC), keeping only the columns we care about.

These data are reported weekly on Fridays and include data up to the previous Saturday. For our purposes, we will treat these as weekly counts on a Monday to Sunday cycle.

# download vaccine coverage dataset from PHAC
# dir.create("raw", showWarnings = FALSE)
# download.file("https://health-infobase.canada.ca/src/data/covidLive/vaccination-coverage-map.csv", "raw/vaccination-coverage-map.csv")

# load vaccine coverage dataset (downloaded 2021-01-18)
vc <- read.csv("raw/vaccination-coverage-map.csv", stringsAsFactors = FALSE)
# keep coverage variables
vc <- vc %>%
  filter(pt != "Canada") %>%
  select(date, pt, percent_dose_1, percent_dose_2, percent_dose_3)

Inspect the vaccine coverage data from PHAC

Let’s take a look at vaccine coverage by dose in each province/territory…

For the most part, the first and second dose coverage time series look good, with two exceptions:

The third dose time series are lacking, beginning in early December 2020 or later and being completely absent for several provinces. These time series will be rewritten using the COVID-19 Canada Open Data Working Group (CCODWG) “additional doses” time series. The only exceptions are Nunavut, which does not currently report third doses except through the Public Health Agency of Canada, and the Northwest Territories, due to issues with separating out vaccinations of residents and non-residents.

Add the third dose time series from CCODWG

# download additional doses dataset from CCODWG
# download.file("https://raw.githubusercontent.com/ccodwg/Covid19Canada/master/timeseries_prov/vaccine_additionaldoses_timeseries_prov.csv", "raw/vaccine_additionaldoses_timeseries_prov.csv")

# load vaccine coverage dataset (downloaded 2021-01-18)
dose3 <- read.csv("raw/vaccine_additionaldoses_timeseries_prov.csv", stringsAsFactors = FALSE)

Let’s calculate third dose coverage using the CCODWG dataset.

# calculate third dose coverage
dose3 <- dose3 %>%
  # join population values
  left_join(pop, by = "pt") %>%
  # remove NU and NWT
  filter(!pt %in% c("Nunavut", "Northwest Territories")) %>%
  # remove 0 values (before third doses were reported)
  filter(n_dose_3_new != 0) %>%
  # calculate third dose coverage
  mutate(percent_dose_3_new = round(n_dose_3_new / pop * 100, 2))

Reporting of third doses was significantly different between provinces and territories. For example, provinces like British Columbia and Alberta started reporting much earlier than other jurisdictions. We will begin the time series on October 3, 2021, as a reasonable number of jurisdictions had begun reporting by this date.

To join back to the original VC coverage dataset, we will keep the Sunday reported value of each week (since data reported on Sunday corresponds to data current up to Saturday, as in the original dataset).

# keep relevant data
dose3 <- dose3 %>%
  # keep Sundays beginning with October 3, 2021
  filter(date %in% seq.Date(from = as.Date("2021-10-3"), to = max(vc$date) + 1, by = "7 days")) %>%
  # subtract one day to join with original dataset
  mutate(date = date - 1) %>%
  # keep relevant variables
  select(date, pt, n_dose_3_new, percent_dose_3_new)

# join datasets
vc <- vc %>%
  left_join(dose3, by = c("date", "pt"))

Let’s see what the old and new third dose datasets look like.

Much better!

Note that our Saskatchewan value differs somewhat from the original PHAC dataset. To speculate, this may be due to different reporting delays (recall that Saskatchewan only publicly reports once per week) or because the PHAC dataset includes both third and fourth doses in the calculation, whereas the province reports these two values separately (the CCODWG dataset only counts third doses).

Final dataset

And now a plot of the final dataset:

Recall that the vaccine coverage data are dated as being current up to the given Saturday, but we should be okay treating these as weekly coverage values on a Monday to Sunday cycle.