The return of the web service

R
hydrology
tidyhydat
Author

Sam Albers

Published

May 4, 2023

The most common question I get about the tidyhydat package goes something like this:

How do I get realtime data longer than the 30 days available in the datamart via realtime_dd?

Previously the answer was… you can’t. The HYDAT database is a historical database of hydrometric data. Data are validated and entered into HYDAT periodically. It is not updated in realtime. At the same time realtime data is only available for 30 days from the datamart.

Now, however, Environment and Climate Change Canada (ECCC) provided a web service that provides realtime data for stations which extends back to about 18 months. This usually spans the gap for current data to when it gets into HYDAT. And since tidyhydat version 0.6.0 you can now access this data in R via the realtime_ws function. This post is a quick introduction to some of the usage of the web service from tidyhydat.

Let’s load a few packages to help illustrate this.

library(tidyhydat)
library(dplyr)
library(lubridate)

Using the web service for realtime hydrometric data

The realtime_ws function operates in a similar way to most of the other functions in tidyhydat particularly the realtime_dd function. You can pass a single station or a vector of stations and the function returns a tibble of data relating to that station. I am assuming that you know which station you want and know its number. For an introduction to tidyhydat see this vignette. You can also search for stations using the tidyhydat::search_stn_name function.

ws <- realtime_ws(
  station_number = "08MF005"
)
glimpse(ws)
Rows: 18,600
Columns: 10
$ STATION_NUMBER <chr> "08MF005", "08MF005", "08MF005", "08MF005", "08MF005", …
$ Date           <dttm> 2023-04-04 00:00:00, 2023-04-04 01:00:00, 2023-04-04 0…
$ Name_En        <chr> "Water temperature", "Water temperature", "Water temper…
$ Value          <dbl> 5.82, 4.87, 4.94, 4.70, 4.21, 3.97, 3.86, 3.81, 3.66, 3…
$ Unit           <chr> "°C", "°C", "°C", "°C", "°C", "°C", "°C", "°C", "°C", "…
$ Grade          <chr> "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "-1", "…
$ Symbol         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Approval       <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Parameter      <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
$ Code           <chr> "TW", "TW", "TW", "TW", "TW", "TW", "TW", "TW", "TW", "…
Parameter Name_En
46 Water level (primary sensor)
16 Water level (secondary sensor, telemetry)
11 Water level (secondary sensor)
52 Water level (tertiary sensor, telemetry)
13 Water level (tertiary sensor)
3 Water level (daily mean)
39 Water level (hourly mean)
14 Elevation, natural lake
42 Elevation, lake or reservoir rule curve
17 Atmospheric pressure
18 Accumulated precipitation
19 Incremental precipitation
47 Discharge (primary sensor derived)
7 Discharge (secondary sensor derived)
10 Discharge (tertiary sensor derived)
6 Discharge (daily mean)
40 Discharge (hourly mean)
8 Discharge (sensor)
50 Snow depth
51 Snow depth, new snowfall
1 Air temperature
5 Water temperature
41 Secondary water temperature
34 Wind direction
35 Wind speed
2 Battery voltage
20 Blue-green algae
21 Conductance
26 Total dissolved solids
43 Dissolved nitrate
22 Dissolved oxygen
24 pH
25 Turbidity
9 Water velocity
37 Water velocity, x
38 Water velocity, y
23 Oxygen saturation
49 Chlorophyll
28 Relative humidity
36 Cell end
4 Internal equipment temperature
12 Tank pressure

Immediately you can see that the data returned is different than the data returned by realtime_dd. In particular notice the Name_En, Parameter and Code columns. These columns are used to identify the parameters we are interested in. Turns out that you can access more than just hydrometric data via the web service (more on that later!). But for now let’s just focus on hydrometric data by supplying 47 to the parameter argument to get discharge. Why did I choose 47? I consulted the param_id internal table which tells me that 47 is the parameter code for discharge. In the margin you can see all the other parameters available.

ws_discharge <- realtime_ws(
  station_number = "08MF005",
  parameter = 47
)

So how many months back does this data go?

range(ws_discharge$Date)
[1] "2023-04-04 00:00:00 UTC" "2023-05-04 23:55:00 UTC"

Wait - I told you that this would extend back 18 months. What gives? Well the default data range for realtime_ws is 30 days back from today. You can change this by supplying a start_date and end_date argument.

ws_discharge <- realtime_ws(
  station_number = "08MF005",
  parameter = 47,
  start_date = Sys.Date() - months(18),
  end_date = Sys.Date()
)

range(ws_discharge$Date)
[1] "2021-11-04 00:00:00 UTC" "2023-05-04 23:55:00 UTC"

Now that’s much better. From here you can make beautiful plots, tables and summaries of that glorious 18 months of data.

Other Parameters

I did however promise that I would mention something about the other parameters available. The long table to the right lists all the possible parameters. In the water office, you can see (sort of) which parameters are available for a given station. However it is lots of clicking. I currently don’t know of an easy way to determine which parameters are available for a given station other than just by checking. So for that I’d recommend querying a station for a short duration.

other_params <- realtime_ws(
  station_number = "08MF005",
  start_date = Sys.Date() - days(1),
)

param_id[param_id$Parameter %in% unique(other_params$Parameter),]
# A tibble: 3 × 7
  Parameter Code  Unit  Name_En            Name_Fr Description_En Description_Fr
      <dbl> <chr> <chr> <chr>              <chr>   <chr>          <chr>         
1        46 HG    m     Water level (prim… Niveau… Height, stage… Hauteur, nive…
2        47 QR    m3/s  Discharge (primar… Debit … Discharge - f… Débit - écoul…
3         5 TW    °C    Water temperature  Tempér… Temperature, … Température, …

Here we can see that 08MF005, which is the Fraser River at Hope station, also monitors water temperature which has a parameter code of 5. If we re-query the web service, we see that we can fine tune our call to the web service to only return water temperature.

fraser_temp <- realtime_ws(
  station_number = "08MF005",
  start_date = Sys.Date() - months(18),
  parameter = 5
)

Why else might I want to use the web service?

One other reason you might consider using the web service is because it can be much faster and more efficient that the datamart. We can construct one call to request all the data rather than iterate through multiple station csvs to get what we want. To illustrate this we can construct a simple function that benchmarks the two approaches. (Yes I know that these aren’t returning exactly the same thing but for these purposes it is good enough.)

compare_realtime <- function(station_number) {
  bench::mark(
    realtime_ws = realtime_ws(
      station_number = station_number,
      parameter = c(46, 47)
    ),
    realtime_dd = realtime_dd(
      station_number = station_number,
    ),
    max_iterations = 5,
    check = FALSE
  )
}

Let’s compare the two functions for a single station:

compare_realtime("08MF005")
# A tibble: 2 × 6
  expression       min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 realtime_ws    1.78s    1.78s     0.560    7.83MB     0   
2 realtime_dd     1.7s     1.7s     0.589  593.99MB     8.84

Ok so on a single station, the two approaches are similar in speed though you can see that lots more memory is being allocated using realtime_dd. By the time you add more stations to the mix, it becomes clear that the web service is a better faster and more efficient approach.

compare_realtime(c("08MF005", "08JC002", "02LA004"))
# A tibble: 2 × 6
  expression       min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 realtime_ws     2.7s     2.7s     0.370   22.38MB     0   
2 realtime_dd    4.66s    4.66s     0.215    1.73GB     6.44

Conclusions

The web service functionality in tidyhydat is still new so if you notice any funky behaviour please let me know. You can do that by opening an issue in the tidyhydat github repo. This functionality is a nice new way to access Canadian hydrometric data and I am excited to see how people may use it.