Skip to content

Conversation

DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Sep 11, 2025

Pairs with #2024

Big set of benchmarks for future us to refer back to.

Note that the intention is not really to compete with replace(x, x == 1L, NA) in terms of direct performance, because these functions can do so much more than that. (remember that internally we are doing vec_match(x, 1L) instead, to account for >1 table size too). But it is at least interesting to compare against them, and I do think that many people will probably use this to one-off recode problematic values like replace_values(x, 1 ~ NA), so it's worth making it pretty fast there.

Also, the intention is not really to complete with to[match(x, from)] either, even though this is roughly what recode_values(x, from ~ to) does. I've included benchmarks against that or case_match() as it is again interesting to compare against them, and we are often competitive with or much better than the base R approach (and remember we can take >1 from and to values).

The massive benefits really kick in one you start having >1 from vector and >1 to vector. Like:

col |> replace_values(
  c(a, b, c) ~ x,
  c(d, e) ~ y,
  c(f, g, h, i) ~ z
)

Then you really get a huge reduction in memory usage compared to typical case_match() or base R (like 8.5gb down to 1gb in some cases)

library(vctrs)

# - 10mil, many uniques in x, integer
# - fully matched by `from`
# - vector to of `from_size`
set.seed(123)
x <- sample(100000, 1e7, replace = TRUE)
from <- sample(100000)
to <- sample(100000)
bench::mark(
  vec_recode_values(
    x,
    from = from,
    to = to
  ),
  to[match(x, from)],
  iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression                            min  median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                        <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recode_values(x, from = from…  65.5ms  76.5ms     13.0      154MB    19.4 
#> 2 to[match(x, from)]                635.2ms 643.7ms      1.55     116MB     1.55

# A case we do worse in. R is weirdly fast with doubles here in the match,
# and we have to slice `to` before we can assign it into the output
#
# - 10mil, many uniques in x
# - fully matched by `from`
# - vector to of `from_size`, strings
set.seed(123)
x <- sample(100000, 1e7, replace = TRUE) + 0
from <- sample(100000) + 0
to <- sample(letters, 100000, replace = TRUE)
bench::mark(
  vec_recode_values(
    x,
    from = from,
    to = to
  ),
  to[match(x, from)],
  iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                           <bch> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recode_values(x, from = from, t… 194ms  203ms      4.83     230MB     7.73
#> 2 to[match(x, from)]                   147ms  151ms      6.57     192MB     5.26

# But weirdly if you run ^ with integers, the base R performance is awful
set.seed(123)
x <- sample(100000, 1e7, replace = TRUE)
from <- sample(100000)
to <- sample(letters, 100000, replace = TRUE)
bench::mark(
  vec_recode_values(
    x,
    from = from,
    to = to
  ),
  to[match(x, from)],
  iterations = 10
)
#> # A tibble: 2 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                           <bch> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recode_values(x, from = from, t… 152ms  152ms      6.56     230MB    65.6 
#> 2 to[match(x, from)]                   658ms  660ms      1.51     154MB     1.51

# - 10mil, few uniques in x, integer
# - fully matched by `from`
# - vector `to` of `from_size`
set.seed(123)
x <- sample(10, 1e7, replace = TRUE)
from <- sample(10)
to <- sample(10)
bench::mark(
  vec_recode_values(
    x,
    from = from,
    to = to
  ),
  to[match(x, from)],
  iterations = 10
)
#> # A tibble: 2 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                          <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recode_values(x, from = from, … 25.6ms 25.9ms      38.6     153MB    154. 
#> 2 to[match(x, from)]                  46.5ms 46.8ms      21.4     114MB     14.3

# - 10mil, few uniques in x, integer
# - fully matched by `from`
# - vector `to` of `from_size`, strings
set.seed(123)
x <- sample(10, 1e7, replace = TRUE)
from <- sample(10)
to <- sample(letters, 10, replace = TRUE)
bench::mark(
  vec_recode_values(
    x,
    from = from,
    to = to
  ),
  to[match(x, from)],
  iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                         <bch:t> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recode_values(x, from = from,…   125ms  129ms      7.45     229MB     7.45
#> 2 to[match(x, from)]                  73.6ms   78ms     12.7      153MB     6.37

# - 10mil, few uniques in x, doubles
# - fully matched by `from`
# - vector `to` of `from_size`
set.seed(123)
x <- sample(10, 1e7, replace = TRUE) + 0
from <- sample(10) + 0
to <- sample(10)
bench::mark(
  vec_recode_values(
    x,
    from = from,
    to = to
  ),
  to[match(x, from)],
  iterations = 10
)
#> # A tibble: 2 × 6
#>   expression                            min  median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                        <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recode_values(x, from = from…  44.8ms  44.9ms     22.2      153MB    51.7 
#> 2 to[match(x, from)]                138.3ms 139.3ms      7.06     153MB     7.06

# - 10mil, many uniques in x, integer
# - replacing 1 value with `from`
# - list `to` of `x_size`
# i.e. `replace_values(x, 1L ~ to)`
set.seed(123)
x <- sample(100000, 1e7, replace = TRUE)
from <- 1L
to <- list(sample(100000, 1e7, replace = TRUE))
bench::mark(
  vec_replace_values(
    x,
    from = from,
    to = to,
    to_as_list_of_vectors = TRUE
  ),
  {
    where <- x == from
    x[where] <- to[[1]][where]
    x
  },
  dplyr::case_match(x, from ~ to[[1]], .default = x),
  iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 3 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                          <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 vec_replace_values(x, from = from,… 27.8ms 31.2ms      29.8     124MB     20.9
#> 2 { where <- x == from x[where] <- t… 17.8ms 21.6ms      43.2     153MB     25.9
#> 3 dplyr::case_match(x, from ~ to[[1]… 75.1ms 77.2ms      12.2     346MB     25.7

# - 10mil, many uniques in x, integer
# - replacing 2 values with `from`
# - list `to` of `x_size`
set.seed(123)
x <- sample(100000, 1e7, replace = TRUE)
from <- c(1L, 2L)
to <- list(x)
bench::mark(
  vec_replace_values(
    x,
    from = from,
    to = to,
    to_as_list_of_vectors = TRUE
  ),
  dplyr::case_match(x, from ~ to[[1]], .default = x),
  iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                          <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 vec_replace_values(x, from = from,… 37.2ms 38.4ms      25.8     124MB     25.8
#> 2 dplyr::case_match(x, from ~ to[[1]… 82.7ms 85.6ms      10.9     343MB     21.8

# - 10mil, many uniques in x, integer
# - replacing 1 value with `from`
# - scalar `to`
set.seed(123)
x <- sample(100000, 1e7, replace = TRUE)
from <- 1L
to <- 0L
bench::mark(
  vec_replace_values(
    x,
    from = from,
    to = to
  ),
  replace(x, x == from, to),
  dplyr::case_match(x, from ~ to, .default = x),
  iterations = 10
)
#> # A tibble: 3 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                          <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 vec_replace_values(x, from = from,… 29.8ms 30.2ms      33.1     124MB    132. 
#> 2 replace(x, x == from, to)             17ms 17.3ms      57.3     114MB     47.7
#> 3 dplyr::case_match(x, from ~ to, .d… 70.8ms 70.8ms      14.1     343MB    184.

# - varying size, few uniques in x, integer
# - replacing 1 value with `from`
# - scalar `to`
set.seed(123)
from <- 1L
to <- 0L
bench::press(
  size = c(1e3, 1e4, 1e5, 1e6, 3e6, 8e6, 1e7, 2e7, 5e7),
  {
    x <- sample(10, size, replace = TRUE)
    bench::mark(
      vec_replace_values(
        x,
        from = from,
        to = to
      ),
      replace(x, x == from, to),
      dplyr::case_match(x, from ~ to, .default = x),
      iterations = 10
    )
  }
) |>
  print(n = Inf)
#> # A tibble: 27 × 14
#>    expression    size      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#>    <bch:expr>   <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#>  1 vec_replace…   1e3   6.03µs   6.97µs 114886.     12.89KB     0       10     0
#>  2 replace(x, …   1e3   3.28µs   3.63µs 246365.     12.27KB     0       10     0
#>  3 dplyr::case…   1e3  85.16µs  90.16µs  10101.     36.76KB     0       10     0
#>  4 vec_replace…   1e4  41.41µs  45.41µs  21707.    127.15KB     0       10     0
#>  5 replace(x, …   1e4  23.33µs  29.93µs  34127.    121.49KB     0       10     0
#>  6 dplyr::case…   1e4  192.5µs 211.56µs   4505.    353.15KB     0       10     0
#>  7 vec_replace…   1e5 407.87µs  472.2µs   2126.      1.24MB     0       10     0
#>  8 replace(x, …   1e5 252.19µs 259.41µs   3803.      1.18MB     0       10     0
#>  9 dplyr::case…   1e5   1.19ms   1.25ms    807.      3.44MB    89.7      9     1
#> 10 vec_replace…   1e6   4.69ms   4.81ms    209.      12.4MB     0       10     0
#> 11 replace(x, …   1e6   2.71ms   2.75ms    361.     11.83MB    40.1      9     1
#> 12 dplyr::case…   1e6   10.7ms  11.15ms     89.9    34.33MB     0       10     0
#> 13 vec_replace…   3e6  13.43ms  13.75ms     71.4    37.19MB     7.94     9     1
#> 14 replace(x, …   3e6   7.61ms   8.14ms    122.     35.48MB    30.5      8     2
#> 15 dplyr::case…   3e6  31.94ms  32.64ms     30.7      103MB    30.7      5     5
#> 16 vec_replace…   8e6  36.15ms  39.47ms     25.3    99.18MB    17.7     10     7
#> 17 replace(x, …   8e6  20.58ms  23.44ms     40.1     94.6MB    28.1     10     7
#> 18 dplyr::case…   8e6  86.41ms  88.81ms     10.9   274.66MB    21.7     10    20
#> 19 vec_replace…   1e7  44.48ms  47.07ms     20.3   123.98MB    18.2     10     9
#> 20 replace(x, …   1e7  25.03ms  29.18ms     33.6   118.25MB    13.4     10     4
#> 21 dplyr::case…   1e7 108.48ms 111.52ms      8.59  343.32MB    12.9     10    15
#> 22 vec_replace…   2e7  86.25ms  92.25ms     10.6   247.96MB     7.44    10     7
#> 23 replace(x, …   2e7  49.75ms  52.33ms     18.8   236.51MB     9.42    10     5
#> 24 dplyr::case…   2e7 208.97ms 215.14ms      4.59  686.65MB     7.34    10    16
#> 25 vec_replace…   5e7 217.58ms 229.86ms      4.39  619.89MB     3.08    10     7
#> 26 replace(x, …   5e7 123.45ms 133.42ms      7.56  591.27MB     3.78    10     5
#> 27 dplyr::case…   5e7  528.6ms 539.68ms      1.85    1.68GB     3.15    10    17
#> # ℹ 5 more variables: total_time <bch:tm>, result <list>, memory <list>,
#> #   time <list>, gc <list>

# - 10mil, many uniques in x, integer
# - replacing 10000 values with `from`
# - scalar `to`
set.seed(123)
x <- sample(100000, 1e7, replace = TRUE)
from <- sample(100000, 10000)
to <- NA_integer_
bench::mark(
  vec_replace_values(
    x,
    from = from,
    to = to
  ),
  replace(x, x %in% from, to),
  dplyr::case_match(x, from ~ to, .default = x),
  iterations = 10
)
#> # A tibble: 3 × 6
#>   expression                            min  median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                        <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl>
#> 1 vec_replace_values(x, from = fro…  71.9ms  76.1ms     13.2      124MB    1.46 
#> 2 replace(x, x %in% from, to)       215.9ms 216.8ms      4.61     195MB    0.513
#> 3 dplyr::case_match(x, from ~ to, … 130.4ms 131.1ms      7.62     343MB    3.27

# - 30mil, few uniques in x, integer
# - replacing groups of `from` values
# - vector `to`
set.seed(123)
x <- sample(10, 3e7, replace = TRUE)
from <- list(1:3, 4:5, 6:10)

bench::mark(
  vec_recode_values(
    x = x,
    from = from,
    to = c("x", "y", "z"),
    from_as_list_of_vectors = TRUE
  ),
  {
    out <- character(length(x))
    out[x %in% 1:3] <- "x"
    out[x %in% 4:5] <- "y"
    out[x %in% 6:10] <- "z"
    out
  },
  dplyr::case_match(x, 1:3 ~ "x", 4:5 ~ "y", 6:10 ~ "z"),
  iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 3 × 6
#>   expression                           min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                       <bch:t> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 "vec_recode_values(x = x, from … 353.7ms 362.66ms     2.69   686.65MB     1.08
#> 2 "{ out <- character(length(x)) …   1.09s    1.11s     0.898    1.68GB     1.17
#> 3 "dplyr::case_match(x, 1:3 ~ \"x…   1.34s    1.35s     0.732    2.01GB     1.32

# - varying size, few uniques in x, integer
# - replacing few values with `from`
# - list of vectors `to`
set.seed(123)
from <- c(2, 5, 3, 6, 9)
bench::press(
  size = c(1e3, 1e4, 1e5, 1e6, 3e6, 8e6, 1e7, 2e7, 5e7),
  {
    x <- sample(10, size, replace = TRUE)
    to <- rep(list(x), times = length(from))
    bench::mark(
      vec_recode_values(
        x,
        from = from,
        to = to,
        to_as_list_of_vectors = TRUE
      ),
      {
        out <- rep(NA_integer_, times = size)
        out[x == from[[1]]] <- to[[1]][x == from[[1]]]
        out[x == from[[2]]] <- to[[2]][x == from[[2]]]
        out[x == from[[3]]] <- to[[3]][x == from[[3]]]
        out[x == from[[4]]] <- to[[4]][x == from[[4]]]
        out[x == from[[5]]] <- to[[5]][x == from[[5]]]
        out
      },
      dplyr::case_match(
        x,
        from[[1]] ~ to[[1]],
        from[[2]] ~ to[[2]],
        from[[3]] ~ to[[3]],
        from[[4]] ~ to[[4]],
        from[[5]] ~ to[[5]],
      ),
      iterations = 10
    )
  }
) |>
  print(n = Inf)
#> # A tibble: 27 × 14
#>    expression    size      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#>    <bch:expr>   <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#>  1 vec_recode_…   1e3   9.14µs   9.59µs 84954.      14.27KB    0        10     0
#>  2 { out <- re…   1e3  25.83µs   28.5µs 35109.      89.52KB    0        10     0
#>  3 dplyr::case…   1e3 164.86µs 173.53µs  5593.     100.46KB    0        10     0
#>  4 vec_recode_…   1e4  90.73µs 102.15µs  9379.     137.41KB    0        10     0
#>  5 { out <- re…   1e4 221.15µs 251.49µs  3973.     880.81KB    0        10     0
#>  6 dplyr::case…   1e4 476.13µs 492.29µs  2014.     979.35KB    0        10     0
#>  7 vec_recode_…   1e5   1.18ms    1.2ms   833.       1.34MB    0        10     0
#>  8 { out <- re…   1e5   2.26ms    2.4ms   416.       8.59MB    0        10     0
#>  9 dplyr::case…   1e5   3.31ms   3.52ms   286.       9.54MB   31.7       9     1
#> 10 vec_recode_…   1e6  11.97ms  12.21ms    81.7     13.35MB    0        10     0
#> 11 { out <- re…   1e6  23.17ms  24.63ms    41.0     85.83MB    0        10     0
#> 12 dplyr::case…   1e6  31.51ms  33.17ms    30.4     95.37MB    3.37      9     1
#> 13 vec_recode_…   3e6  37.18ms  37.83ms    26.5     40.06MB    0        10     0
#> 14 { out <- re…   3e6  68.58ms  71.05ms    14.0     257.5MB    3.49      8     2
#> 15 dplyr::case…   3e6  98.86ms 103.28ms     9.66   286.11MB    2.41      8     2
#> 16 vec_recode_…   8e6  97.35ms  98.32ms    10.1    106.82MB    1.12      9     1
#> 17 { out <- re…   8e6  180.7ms 182.43ms     5.46   686.66MB    8.19      4     6
#> 18 dplyr::case…   8e6 252.55ms 252.59ms     3.96   762.94MB   15.8       2     8
#> 19 vec_recode_…   1e7 121.47ms 122.78ms     8.14   133.52MB    0.905     9     1
#> 20 { out <- re…   1e7 225.09ms 225.81ms     4.43   858.33MB    4.43      5     5
#> 21 dplyr::case…   1e7 311.92ms 314.37ms     3.19   953.68MB    8.49      3     8
#> 22 vec_recode_…   2e7 242.62ms 247.48ms     4.00   267.04MB    1.20     10     3
#> 23 { out <- re…   2e7  466.4ms 476.64ms     2.09     1.68GB    3.76     10    18
#> 24 dplyr::case…   2e7 641.05ms 671.58ms     1.48     1.86GB    3.71     10    25
#> 25 vec_recode_…   5e7 612.17ms 631.42ms     1.57   667.59MB    0.626    10     4
#> 26 { out <- re…   5e7    1.19s    1.22s     0.818    4.19GB    2.70     10    33
#> 27 dplyr::case…   5e7    1.67s    1.71s     0.584    4.66GB    2.10     10    36
#> # ℹ 5 more variables: total_time <bch:tm>, result <list>, memory <list>,
#> #   time <list>, gc <list>

# - varying size, few uniques in x, integer
# - remapping all values with `from`
# - vector `to`, strings
set.seed(123)
from <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
to <- as.character(from)
bench::press(
  size = c(1e3, 1e4, 1e5, 1e6, 3e6, 8e6, 1e7, 2e7, 5e7),
  {
    x <- sample(10, size, replace = TRUE)
    bench::mark(
      vec_recode_values(
        x,
        from = from,
        to = to
      ),
      dplyr::case_match(
        x,
        from[[1]] ~ to[[1]],
        from[[2]] ~ to[[2]],
        from[[3]] ~ to[[3]],
        from[[4]] ~ to[[4]],
        from[[5]] ~ to[[5]],
        from[[6]] ~ to[[6]],
        from[[7]] ~ to[[7]],
        from[[8]] ~ to[[8]],
        from[[9]] ~ to[[9]],
        from[[10]] ~ to[[10]]
      ),
      iterations = 10
    )
  }
) |>
  print(n = Inf)
#> # A tibble: 18 × 14
#>    expression    size      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#>    <bch:expr>   <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#>  1 vec_recode_…   1e3  15.01µs  15.64µs 56564.      23.81KB    0        10     0
#>  2 dplyr::case…   1e3  295.9µs 305.37µs  3120.     183.82KB    0        10     0
#>  3 vec_recode_…   1e4 116.85µs 121.03µs  8267.     234.75KB    0        10     0
#>  4 dplyr::case…   1e4 909.38µs 942.06µs  1057.       1.76MB  117.        9     1
#>  5 vec_recode_…   1e5   1.17ms   1.19ms   824.       2.29MB    0        10     0
#>  6 dplyr::case…   1e5   6.53ms   6.64ms   150.      17.55MB    0        10     0
#>  7 vec_recode_…   1e6  11.78ms  12.27ms    82.2     22.89MB    0        10     0
#>  8 dplyr::case…   1e6  65.77ms  67.98ms    14.6    175.48MB    0        10     0
#>  9 vec_recode_…   3e6  36.13ms   37.7ms    26.7     68.67MB    0        10     0
#> 10 dplyr::case…   3e6 192.11ms 204.65ms     4.92   526.43MB    1.23      8     2
#> 11 vec_recode_…   8e6  97.21ms  99.24ms    10.0    183.11MB    1.12      9     1
#> 12 dplyr::case…   8e6 506.34ms 506.34ms     1.97     1.37GB   21.7       1    11
#> 13 vec_recode_…   1e7 120.79ms 123.52ms     8.09   228.88MB    0.898     9     1
#> 14 dplyr::case…   1e7 661.43ms  667.1ms     1.50     1.71GB    4.49      3     9
#> 15 vec_recode_…   2e7  238.5ms 248.07ms     3.90   457.76MB    1.17     10     3
#> 16 dplyr::case…   2e7    1.32s    1.41s     0.709    3.43GB    2.06     10    29
#> 17 vec_recode_…   5e7 593.44ms 644.79ms     1.57     1.12GB    0.940    10     6
#> 18 dplyr::case…   5e7    3.39s     3.5s     0.287    8.57GB    1.49     10    52
#> # ℹ 5 more variables: total_time <bch:tm>, result <list>, memory <list>,
#> #   time <list>, gc <list>

# - varying size, few uniques in x, integer
# - remapping all values with `from`
# - vector `to`, doubles
#
# Wild amount of memory improvements here
set.seed(123)
from <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
to <- from
bench::press(
  size = c(1e3, 1e4, 1e5, 1e6, 3e6, 8e6, 1e7, 2e7, 5e7),
  {
    x <- sample(10, size, replace = TRUE)
    bench::mark(
      vec_recode_values(
        x,
        from = from,
        to = to
      ),
      dplyr::case_match(
        x,
        from[[1]] ~ to[[1]],
        from[[2]] ~ to[[2]],
        from[[3]] ~ to[[3]],
        from[[4]] ~ to[[4]],
        from[[5]] ~ to[[5]],
        from[[6]] ~ to[[6]],
        from[[7]] ~ to[[7]],
        from[[8]] ~ to[[8]],
        from[[9]] ~ to[[9]],
        from[[10]] ~ to[[10]]
      ),
      iterations = 10
    )
  }
)
#> # A tibble: 18 × 7
#>    expression                size      min   median `itr/sec` mem_alloc `gc/sec`
#>    <bch:expr>               <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#>  1 vec_recode_values(x, fr…   1e3   6.36µs   7.22µs   1.10e+5   23.81KB     0   
#>  2 dplyr::case_match(x, fr…   1e3 284.75µs 317.95µs   3.09e+3  183.82KB     0   
#>  3 vec_recode_values(x, fr…   1e4  39.28µs  41.55µs   2.34e+4  234.75KB     0   
#>  4 dplyr::case_match(x, fr…   1e4  811.6µs 864.73µs   1.15e+3    1.76MB     0   
#>  5 vec_recode_values(x, fr…   1e5 318.16µs 356.29µs   2.78e+3    2.29MB     0   
#>  6 dplyr::case_match(x, fr…   1e5   5.73ms   6.02ms   1.67e+2   17.55MB     0   
#>  7 vec_recode_values(x, fr…   1e6   3.44ms   3.65ms   2.72e+2   22.89MB     0   
#>  8 dplyr::case_match(x, fr…   1e6  56.43ms  58.46ms   1.72e+1  175.48MB     1.91
#>  9 vec_recode_values(x, fr…   3e6   8.82ms  14.22ms   7.95e+1   68.67MB     8.83
#> 10 dplyr::case_match(x, fr…   3e6 169.43ms  173.1ms   5.67e+0  526.43MB     2.43
#> 11 vec_recode_values(x, fr…   8e6  26.25ms  30.48ms   3.39e+1  183.11MB     3.77
#> 12 dplyr::case_match(x, fr…   8e6 445.26ms    469ms   2.14e+0    1.37GB     2.14
#> 13 vec_recode_values(x, fr…   1e7  34.81ms  40.03ms   2.52e+1  228.88MB     2.80
#> 14 dplyr::case_match(x, fr…   1e7 563.18ms 571.27ms   1.75e+0    1.71GB     7.88
#> 15 vec_recode_values(x, fr…   2e7  64.55ms  74.66ms   1.24e+1  457.76MB     4.98
#> 16 dplyr::case_match(x, fr…   2e7    1.13s    1.18s   8.45e-1    3.43GB     2.45
#> 17 vec_recode_values(x, fr…   5e7 162.75ms 181.62ms   5.19e+0    1.12GB     2.59
#> 18 dplyr::case_match(x, fr…   5e7    2.87s    2.93s   3.42e-1    8.57GB     2.02

# 10mil, few uniques in x, data frame
# remapping all values in `from`
# vector `to`
x <- data_frame(
  a = sample(10, 1e7, replace = TRUE),
  b = sample(10, 1e7, replace = TRUE)
)
from <- vec_expand_grid(
  a = 1:10,
  b = 1:10
)
to <- data_frame(
  c = 1L,
  d = 2L
)
bench::mark(
  vec_recode_values(x, from = from, to = to),
  dplyr::case_match(x, from ~ to)
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 × 6
#>   expression                                      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                                 <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recode_values(x, from = from, to = to)    109ms    110ms      7.80     153MB     3.90
#> 2 dplyr::case_match(x, from ~ to)               185ms    190ms      4.78     381MB     9.55

@DavisVaughan DavisVaughan force-pushed the feature/vec-recode-values branch 2 times, most recently from c9e1c14 to 57927eb Compare September 11, 2025 17:50
@DavisVaughan DavisVaughan force-pushed the feature/vec-recode-values branch 5 times, most recently from 7df8299 to 3f6016e Compare September 12, 2025 15:49
@DavisVaughan DavisVaughan marked this pull request as ready for review September 12, 2025 15:49
@DavisVaughan DavisVaughan force-pushed the feature/vec-recode-values branch 3 times, most recently from 648a35b to 52068ab Compare September 12, 2025 18:55
@DavisVaughan DavisVaughan force-pushed the feature/vec-recode-values branch from 52068ab to 856c443 Compare September 12, 2025 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant