Steffi LaZerte & Sam Albers PCAG - weatherc… · Steffi LaZerte & Sam Albers weathercan An R...

Post on 27-Jul-2020

12 views 0 download

transcript

Steffi LaZerte & Sam Albers

weathercanAn R package for accessing Environment and Climate Change Canada weather data

@steffilazerte steffilazerte steffilazerte.ca PCAG 2017

Historical weather data

Environment and Climate Change Canada

1840 to Present

Hourly, daily, monthly intervals

> 26,000 stations (past and present)

Historical weather data

Environment and Climate Change Canada

1840 to Present

Hourly, daily, monthly intervals

> 26,000 stations (past and present)

Lots of Data!

Accessing data from ECCC website

Data good but not ready

weathercan: An R package

What's R?

What's R?

An open source, programming language, and software environment

What's R?

An open source, programming language, and software environment

Often used with RStudio IDE

Why use weathercan?

Why use weathercan?

Free

Free and open-source software (FOSS)

Why use weathercan?

Free

Free and open-source software (FOSS)

Fast and Easy

One line of code to download data from many stations, over many years

Instantly usable

Why use weathercan?

Free

Free and open-source software (FOSS)

Fast and Easy

One line of code to download data from many stations, over many years

Instantly usable

Customizable

Data is trimmed to start and end times

You can specify stations, time intervals, timezones, etc.

Why use weathercan?

Reproducible!

Scripts provide a record of actions

Just note the weathercan version (packageVersion(weathercan))

Hard to document mouse clicks or website searches

Getting started with weathercan

Installing devtoolsinstall.packages("devtools")

Installing weathercan with devtoolsdevtools::install_github("steffilazerte/weathercan", build_vignettes = TRUE)

Basic usageCode

library(weathercan)

w <- weather(station_ids = c(50821, 51097), start = "2017-09-01")

Basic usageCode

library(weathercan)

w <- weather(station_ids = c(50821, 51097), start = "2017-09-01")

Output## # A tibble: 1,344 x 28

## station_name station_id prov lat lon time hmdx hmdx_flag pressure

## * <chr> <dbl> <fctr> <dbl> <dbl> <dttm> <dbl> <chr> <dbl>

## 1 BRANDON A 50821 MB 49.91 -99.95 2017-09-01 00:00:00 26 96.21

## 2 BRANDON A 50821 MB 49.91 -99.95 2017-09-01 01:00:00 26 96.15

## 3 BRANDON A 50821 MB 49.91 -99.95 2017-09-01 02:00:00 25 96.09

## 4 BRANDON A 50821 MB 49.91 -99.95 2017-09-01 03:00:00 NA 96.07

## 5 BRANDON A 50821 MB 49.91 -99.95 2017-09-01 04:00:00 NA 96.08

## # ... with 1,339 more rows, and 19 more variables

Plotting

ggplot(data = w, aes(x = time, y = temp, colour = station_name)) +

theme_bw() +

geom_line() +

labs(x = "Date", y = "Temperature C", colour = "Station")

And done!

library(weathercan)

w <- weather(station_ids = c(50821, 51097), start = "2017-09-01")

ggplot(data = w, aes(x = time, y = temp, colour = station_name)) +

theme_bw() +

geom_line() +

labs(x = "Date", y = "Temperature C", colour = "Station")

Hmmm...

library(weathercan)

w <- weather(station_ids = c(50821, 51097), start = "2017-09-01")

ggplot(data = w, aes(x = time, y = temp, colour = station_name)) +

theme_bw() +

geom_line() +

labs(x = "Date", y = "Temperature C", colour = "Station")

?

Hmmm...

library(weathercan)

w <- weather(station_ids = c(50821, 51097), start = "2017-09-01")

ggplot(data = w, aes(x = time, y = temp, colour = station_name)) +

theme_bw() +

geom_line() +

labs(x = "Date", y = "Temperature C", colour = "Station")

How do we get station ids?

?

Searching by station name

stations_search(name = "Brandon", interval = "hour")

Searching by station name

stations_search(name = "Brandon", interval = "hour")

## # A tibble: 3 x 10

## prov station_name station_id climate_id lat lon elev interval start end

## <fctr> <chr> <fctr> <fctr> <dbl> <dbl> <dbl> <chr> <int> <int>

## 1 MB BRANDON A 3471 5010480 49.91 -99.95 409.4 hour 1958 2012

## 2 MB BRANDON A 50821 5010481 49.91 -99.95 409.3 hour 2012 2017

## 3 MB BRANDON RCS 49909 5010490 49.90 -99.95 409.4 hour 2012 2017

Alternative: Searching by coordinates

Alternatively search according to location: c(latitude, longitude)

Search within 10km of this location: dist = 10

stations_search(coords = c(49.84847, -99.95009), dist = 10, interval = "hour")

Alternative: Searching by coordinates

Alternatively search according to location: c(latitude, longitude)

Search within 10km of this location: dist = 10

stations_search(coords = c(49.84847, -99.95009), dist = 10, interval = "hour")

## # A tibble: 3 x 11

## prov station_name station_id climate_id lat lon elev interval start end distance

## <fctr> <chr> <fctr> <fctr> <dbl> <dbl> <dbl> <chr> <int> <int> <dbl>

## 1 MB BRANDON RCS 49909 5010490 49.90 -99.95 409.4 hour 2012 2017 5.731565

## 2 MB BRANDON A 3471 5010480 49.91 -99.95 409.4 hour 1958 2012 6.843848

## 3 MB BRANDON A 50821 5010481 49.91 -99.95 409.3 hour 2012 2017 6.843848

Understanding the dataFlags## # A tibble: 7 x 6

## station_id date mean_min_temp mean_min_temp_flag mean_temp mean_temp_flag

## * <dbl> <date> <dbl> <chr> <dbl> <chr>

## 1 5401 2017-01-01 -7.9 -4.4

## 2 5401 2017-02-01 -8.7 -4.3

## 3 5401 2017-03-01 -9.6 -5.2

## 4 5401 2017-04-01 3.3 7.9

## 5 5401 2017-05-01 6.7 E 11.8 E

## 6 5401 2017-06-01 12.3 17.5

## 7 5401 2017-07-01 14.3 19.3

Understanding the data

vignette("flags", package = "weathercan")

Understanding the dataUnits and measurements## # A tibble: 1,344 x 6

## station_id time temp temp_dew rel_hum wind_dir

## * <dbl> <dttm> <dbl> <dbl> <dbl> <dbl>

## 1 50821 2017-09-01 00:00:00 20.8 17.3 80 18

## 2 50821 2017-09-01 01:00:00 20.8 17.2 80 17

## 3 50821 2017-09-01 02:00:00 20.0 16.9 83 17

## 4 50821 2017-09-01 03:00:00 19.4 16.9 85 16

## 5 50821 2017-09-01 04:00:00 19.2 17.2 88 19

## 6 50821 2017-09-01 05:00:00 18.8 17.8 93 17

## 7 50821 2017-09-01 06:00:00 18.9 17.9 94 16

## 8 50821 2017-09-01 07:00:00 18.3 17.7 96 18

## 9 50821 2017-09-01 08:00:00 19.9 17.8 88 20

## 10 50821 2017-09-01 09:00:00 20.6 18.2 86 23

## # ... with 1,334 more rows

Understanding the data

vignette("glossary", package = "weathercan")

Combining with other data

Adding weather data to other data sets

Times don't always line up

Sediment data## # A tibble: 1,392 x 2

## time amount

## <dttm> <dbl>

## 1 2017-09-01 00:05:34 168.3133

## 2 2017-09-01 00:35:34 156.9122

## 3 2017-09-01 01:05:34 175.6169

## 4 2017-09-01 01:35:34 184.5908

## 5 2017-09-01 02:05:34 163.2017

## 6 2017-09-01 02:35:34 169.2177

## 7 2017-09-01 03:05:34 167.8620

## # ... with 1,385 more rows

Brandon Weather data## # A tibble: 672 x 3

## time temp pressure

## <dttm> <dbl> <dbl>

## 1 2017-09-01 00:00:00 20.8 96.21

## 2 2017-09-01 01:00:00 20.8 96.15

## 3 2017-09-01 02:00:00 20.0 96.09

## 4 2017-09-01 03:00:00 19.4 96.07

## 5 2017-09-01 04:00:00 19.2 96.08

## 6 2017-09-01 05:00:00 18.8 96.05

## 7 2017-09-01 06:00:00 18.9 96.04

## # ... with 665 more rows

Combining with other data

Adding weather data to other data sets

Times don't always line up

Interpolating

Linear interpolation where possible

Only a single weather station at a time

w <- weather(station_ids = 50821, start = "2017-09-01")

sediment <- add_weather(data = sediment,

weather = w,

col = c("temp", "pressure"))

Sediment data## # A tibble: 1,392 x 4

## time amount temp pressure

## <dttm> <dbl> <dbl> <dbl>

## 1 2017-09-01 00:05:34 168.3133 20.80000 96.20443

## 2 2017-09-01 00:35:34 156.9122 20.80000 96.17443

## 3 2017-09-01 01:05:34 175.6169 20.72578 96.14443

## 4 2017-09-01 01:35:34 184.5908 20.32578 96.11443

## 5 2017-09-01 02:05:34 163.2017 19.94433 96.08814

## 6 2017-09-01 02:35:34 169.2177 19.64433 96.07814

## 7 2017-09-01 03:05:34 167.8620 19.38144 96.07093

## # ... with 1,385 more rows

Weather data## # A tibble: 672 x 3

## time temp pressure

## * <dttm> <dbl> <dbl>

## 1 2017-09-01 00:00:00 20.8 96.21

## 2 2017-09-01 01:00:00 20.8 96.15

## 3 2017-09-01 02:00:00 20.0 96.09

## 4 2017-09-01 03:00:00 19.4 96.07

## 5 2017-09-01 04:00:00 19.2 96.08

## 6 2017-09-01 05:00:00 18.8 96.05

## 7 2017-09-01 06:00:00 18.9 96.04

## # ... with 665 more rows

Interpolating

Recap!

Recap!1. Load weathercan package

library(weathercan)

Recap!1. Load weathercan package

library(weathercan)

2. Find a station

stations_search("Brandon")

Recap!1. Load weathercan package

library(weathercan)

2. Find a station

stations_search("Brandon")

3. Download weather

w <- weather(station_ids = 50821, start = "2017-09-01")

Recap!1. Load weathercan package

library(weathercan)

2. Find a station

stations_search("Brandon")

3. Download weather

w <- weather(station_ids = 50821, start = "2017-09-01")

4. Add weather data to an existing data set

sediment <- add_weather(data = sediment, weather = w, cols = "temp")

We invite contributions!

Openly developed on GitHub

Contribute what you can (You don't have to be an R programmer!):

Ideas / Feature-requests

Bugs

Bug-fixes

Development

: http://github.com/steffilazerte/weathercan

Help with weathercanTutorials and Reference: http://steffilazerte.github.io/weathercan

This presentation: https://steffilazerte.github.io/Presentations/

Contact Steffi: @steffilazerte steffilazerte steffilazerte.ca

Help with weathercanTutorials and Reference: http://steffilazerte.github.io/weathercan

This presentation: https://steffilazerte.github.io/Presentations/

Contact Steffi: @steffilazerte steffilazerte steffilazerte.ca

Thanks!Dr. David J. Hill

Slides created via the R package xaringan, using remark.js, knitr, and R Markdown

weathercan v0.2.3