The tidyhydro
package provides a set of commonly used metrics in
hydrology (such as NSE, KGE, pBIAS) for use within a
tidymodels
infrastructure. Originally
inspired by the
yardstick
and
hydroGOF
packages, this
library is mainly written in C++ and provides a very quick estimation of
desired goodness-of-fit criteria.
Additionally, you’ll find here a C++ implementation of lesser-known yet powerful metrics and descriptive statistics recommended in the United States Geological Survey (USGS) and the New Zealand National Environmental Monitoring Standards (NEMS) guidelines. Examples include PRESS (Prediction Error Sum of Squares), SFE (Standard Factorial Error), MSPE (Model Standard Percentage Error) and others. Based on the equations from Helsel et al. (2020), Rasmunsen et al. (2008), Hicks et al. (2020) and etc. (see documentation for details).
The tidyhydro
package follows the philosophy of
yardstick
and
provides S3 class methods for vectors and data frames. For example, one
can estimate KGE
, NSE
or pBIAS
for a data frame like this:
library(tidyhydro)
str(avacha)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 365 obs. of 3 variables:
#> $ date: Date, format: "2022-01-01" "2022-01-02" ...
#> $ obs : num 76.2 76.2 76.3 76.3 76.4 76.4 76.5 76.5 76.6 76.6 ...
#> $ sim : num 84.8 84.3 84 83.7 83.4 ...
kge(avacha, obs, sim)
#> # A tibble: 1 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 kge standard 0.947
or create a
metric_set
and estimate several parameters at once like this:
hydro_metrics <- yardstick::metric_set(nse, pbias)
hydro_metrics(avacha, obs, sim)
#> # A tibble: 2 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 nse standard 0.895
#> 2 pbias standard 0.0540
We do understand that sometimes one needs a qualitative interpretation
of the model. Therefore, we populated some functions with a
performance
argument. When performance = TRUE
, the metric
interpretation will be returned according to Moriasi et
al. (2015).
hydro_metrics(avacha, obs, sim, performance = TRUE)
#> # A tibble: 2 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <chr>
#> 1 nse standard Excellent
#> 2 pbias standard Excellent
In addition to metric
, inherited from yardstick
, the tidyhydro
introduces the measure
objects. It aims to calculate descriptive
statistics of a single dataset, such as cv()
— coefficient of
variation (a measure of variability) or gm()
— geometric mean (a
measure of central tendency):
# Coefficient of Variation
cv(avacha, obs)
#> # A tibble: 1 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 cv standard 0.533
# Geometric mean
gm_vec(avacha$obs)
#> [1] 128.9476
Similarly to metric_set
, one can create a measure_set
and estimate
desired descriptive statistics at once:
ms <- measure_set(cv, gm)
ms(avacha, obs)
#> # A tibble: 2 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 cv standard 0.533
#> 2 gm standard 129.
You can install the development version of tidyhydro
from
GitHub with:
# install.packages("pak")
pak::pak("atsyplenkov/tidyhydro")
Since the package uses Rcpp
in the background, it performs slightly
faster than base R and other R packages (see
benchmarks).
This is particularly noticeable with large datasets:
set.seed(12234)
x <- runif(10^6)
y <- runif(10^6)
nse <- function(truth, estimate, na_rm = TRUE) {
#fmt: skip
1 - (sum((truth - estimate)^2, na.rm = na_rm) /
sum((truth - mean(truth, na.rm = na_rm))^2, na.rm = na_rm))
}
bench::mark(
tidyhydro = tidyhydro::nse_vec(truth = x, estimate = y),
hydroGOF = hydroGOF::NSE(sim = y, obs = x),
baseR = nse(truth = x, estimate = y),
check = TRUE,
relative = TRUE,
filter_gc = FALSE,
iterations = 50L
)
#> # A tibble: 3 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 tidyhydro 1 1 18.2 NaN NaN
#> 2 hydroGOF 11.7 11.6 1 Inf Inf
#> 3 baseR 7.19 8.63 1.98 Inf Inf
Please note that the tidyhydro project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.