library(tidyhydat)
library(dplyr)
library(lubridate)
The most common question I get about the tidyhydat package goes something like this:
How do I get realtime data longer than the 30 days available in the datamart via
realtime_dd
?
Previously the answer was… you can’t. The HYDAT database is a historical database of hydrometric data. Data are validated and entered into HYDAT periodically. It is not updated in realtime. At the same time realtime data is only available for 30 days from the datamart.
Now, however, Environment and Climate Change Canada (ECCC) provided a web service that provides realtime data for stations which extends back to about 18 months. This usually spans the gap for current data to when it gets into HYDAT. And since tidyhydat version 0.6.0 you can now access this data in R via the realtime_ws
function. This post is a quick introduction to some of the usage of the web service from tidyhydat.
Let’s load a few packages to help illustrate this.
Using the web service for realtime hydrometric data
The realtime_ws
function operates in a similar way to most of the other functions in tidyhydat particularly the realtime_dd
function. You can pass a single station or a vector of stations and the function returns a tibble of data relating to that station. I am assuming that you know which station you want and know its number. For an introduction to tidyhydat see this vignette. You can also search for stations using the tidyhydat::search_stn_name
function.
<- realtime_ws(
ws station_number = "08MF005"
)glimpse(ws)
Rows: 18,600
Columns: 10
$ STATION_NUMBER <chr> "08MF005", "08MF005", "08MF005", "08MF005", "08MF005", …
$ Date <dttm> 2023-04-04 00:00:00, 2023-04-04 01:00:00, 2023-04-04 0…
$ Name_En <chr> "Water temperature", "Water temperature", "Water temper…
$ Value <dbl> 5.82, 4.87, 4.94, 4.70, 4.21, 3.97, 3.86, 3.81, 3.66, 3…
$ Unit <chr> "°C", "°C", "°C", "°C", "°C", "°C", "°C", "°C", "°C", "…
$ Grade <chr> "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "…
$ Symbol <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Approval <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Parameter <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
$ Code <chr> "TW", "TW", "TW", "TW", "TW", "TW", "TW", "TW", "TW", "…
Parameter | Name_En |
---|---|
46 | Water level (primary sensor) |
16 | Water level (secondary sensor, telemetry) |
11 | Water level (secondary sensor) |
52 | Water level (tertiary sensor, telemetry) |
13 | Water level (tertiary sensor) |
3 | Water level (daily mean) |
39 | Water level (hourly mean) |
14 | Elevation, natural lake |
42 | Elevation, lake or reservoir rule curve |
17 | Atmospheric pressure |
18 | Accumulated precipitation |
19 | Incremental precipitation |
47 | Discharge (primary sensor derived) |
7 | Discharge (secondary sensor derived) |
10 | Discharge (tertiary sensor derived) |
6 | Discharge (daily mean) |
40 | Discharge (hourly mean) |
8 | Discharge (sensor) |
50 | Snow depth |
51 | Snow depth, new snowfall |
1 | Air temperature |
5 | Water temperature |
41 | Secondary water temperature |
34 | Wind direction |
35 | Wind speed |
2 | Battery voltage |
20 | Blue-green algae |
21 | Conductance |
26 | Total dissolved solids |
43 | Dissolved nitrate |
22 | Dissolved oxygen |
24 | pH |
25 | Turbidity |
9 | Water velocity |
37 | Water velocity, x |
38 | Water velocity, y |
23 | Oxygen saturation |
49 | Chlorophyll |
28 | Relative humidity |
36 | Cell end |
4 | Internal equipment temperature |
12 | Tank pressure |
Immediately you can see that the data returned is different than the data returned by realtime_dd
. In particular notice the Name_En
, Parameter
and Code
columns. These columns are used to identify the parameters we are interested in. Turns out that you can access more than just hydrometric data via the web service (more on that later!). But for now let’s just focus on hydrometric data by supplying 47 to the parameter argument to get discharge. Why did I choose 47? I consulted the param_id
internal table which tells me that 47 is the parameter code for discharge. In the margin you can see all the other parameters available.
<- realtime_ws(
ws_discharge station_number = "08MF005",
parameter = 47
)
So how many months back does this data go?
range(ws_discharge$Date)
[1] "2023-04-04 00:00:00 UTC" "2023-05-04 23:55:00 UTC"
Wait - I told you that this would extend back 18 months. What gives? Well the default data range for realtime_ws
is 30 days back from today. You can change this by supplying a start_date
and end_date
argument.
<- realtime_ws(
ws_discharge station_number = "08MF005",
parameter = 47,
start_date = Sys.Date() - months(18),
end_date = Sys.Date()
)
range(ws_discharge$Date)
[1] "2021-11-04 00:00:00 UTC" "2023-05-04 23:55:00 UTC"
Now that’s much better. From here you can make beautiful plots, tables and summaries of that glorious 18 months of data.
Other Parameters
I did however promise that I would mention something about the other parameters available. The long table to the right lists all the possible parameters. In the water office, you can see (sort of) which parameters are available for a given station. However it is lots of clicking. I currently don’t know of an easy way to determine which parameters are available for a given station other than just by checking. So for that I’d recommend querying a station for a short duration.
<- realtime_ws(
other_params station_number = "08MF005",
start_date = Sys.Date() - days(1),
)
$Parameter %in% unique(other_params$Parameter),] param_id[param_id
# A tibble: 3 × 7
Parameter Code Unit Name_En Name_Fr Description_En Description_Fr
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 46 HG m Water level (prim… Niveau… Height, stage… Hauteur, nive…
2 47 QR m3/s Discharge (primar… Debit … Discharge - f… Débit - écoul…
3 5 TW °C Water temperature Tempér… Temperature, … Température, …
Here we can see that 08MF005
, which is the Fraser River at Hope station, also monitors water temperature which has a parameter code of 5. If we re-query the web service, we see that we can fine tune our call to the web service to only return water temperature.
<- realtime_ws(
fraser_temp station_number = "08MF005",
start_date = Sys.Date() - months(18),
parameter = 5
)
Why else might I want to use the web service?
One other reason you might consider using the web service is because it can be much faster and more efficient that the datamart. We can construct one call to request all the data rather than iterate through multiple station csvs to get what we want. To illustrate this we can construct a simple function that benchmarks the two approaches. (Yes I know that these aren’t returning exactly the same thing but for these purposes it is good enough.)
<- function(station_number) {
compare_realtime ::mark(
benchrealtime_ws = realtime_ws(
station_number = station_number,
parameter = c(46, 47)
),realtime_dd = realtime_dd(
station_number = station_number,
),max_iterations = 5,
check = FALSE
) }
Let’s compare the two functions for a single station:
compare_realtime("08MF005")
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 realtime_ws 1.78s 1.78s 0.560 7.83MB 0
2 realtime_dd 1.7s 1.7s 0.589 593.99MB 8.84
Ok so on a single station, the two approaches are similar in speed though you can see that lots more memory is being allocated using realtime_dd
. By the time you add more stations to the mix, it becomes clear that the web service is a better faster and more efficient approach.
compare_realtime(c("08MF005", "08JC002", "02LA004"))
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 realtime_ws 2.7s 2.7s 0.370 22.38MB 0
2 realtime_dd 4.66s 4.66s 0.215 1.73GB 6.44
Conclusions
The web service functionality in tidyhydat is still new so if you notice any funky behaviour please let me know. You can do that by opening an issue in the tidyhydat github repo. This functionality is a nice new way to access Canadian hydrometric data and I am excited to see how people may use it.