Implement `vec_recode_values()` and `vec_replace_values()` #2027

DavisVaughan · 2025-09-11T13:15:56Z

Pairs with #2024

Big set of benchmarks for future us to refer back to.

Note that the intention is not really to compete with replace(x, x == 1L, NA) in terms of direct performance, because these functions can do so much more than that. (remember that internally we are doing vec_match(x, 1L) instead, to account for >1 table size too). But it is at least interesting to compare against them, and I do think that many people will probably use this to one-off recode problematic values like replace_values(x, 1 ~ NA), so it's worth making it pretty fast there.

Also, the intention is not really to complete with to[match(x, from)] either, even though this is roughly what recode_values(x, from ~ to) does. I've included benchmarks against that or case_match() as it is again interesting to compare against them, and we are often competitive with or much better than the base R approach (and remember we can take >1 from and to values).

The massive benefits really kick in one you start having >1 from vector and >1 to vector. Like:

col |> replace_values(
  c(a, b, c) ~ x,
  c(d, e) ~ y,
  c(f, g, h, i) ~ z
)

Then you really get a huge reduction in memory usage compared to typical case_match() or base R (like 8.5gb down to 1gb in some cases)

library(vctrs)

# - 10mil, many uniques in x, integer
# - fully matched by `from`
# - vector to of `from_size`
set.seed(123)
x <- sample(100000, 1e7, replace = TRUE)
from <- sample(100000)
to <- sample(100000)
bench::mark(
  vec_recode_values(
    x,
    from = from,
    to = to
  ),
  to[match(x, from)],
  iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression                            min  median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                        <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recode_values(x, from = from…  65.5ms  76.5ms     13.0      154MB    19.4 
#> 2 to[match(x, from)]                635.2ms 643.7ms      1.55     116MB     1.55

# A case we do worse in. R is weirdly fast with doubles here in the match,
# and we have to slice `to` before we can assign it into the output
#
# - 10mil, many uniques in x
# - fully matched by `from`
# - vector to of `from_size`, strings
set.seed(123)
x <- sample(100000, 1e7, replace = TRUE) + 0
from <- sample(100000) + 0
to <- sample(letters, 100000, replace = TRUE)
bench::mark(
  vec_recode_values(
    x,
    from = from,
    to = to
  ),
  to[match(x, from)],
  iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                           <bch> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recode_values(x, from = from, t… 194ms  203ms      4.83     230MB     7.73
#> 2 to[match(x, from)]                   147ms  151ms      6.57     192MB     5.26

# But weirdly if you run ^ with integers, the base R performance is awful
set.seed(123)
x <- sample(100000, 1e7, replace = TRUE)
from <- sample(100000)
to <- sample(letters, 100000, replace = TRUE)
bench::mark(
  vec_recode_values(
    x,
    from = from,
    to = to
  ),
  to[match(x, from)],
  iterations = 10
)
#> # A tibble: 2 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                           <bch> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recode_values(x, from = from, t… 152ms  152ms      6.56     230MB    65.6 
#> 2 to[match(x, from)]                   658ms  660ms      1.51     154MB     1.51

# - 10mil, few uniques in x, integer
# - fully matched by `from`
# - vector `to` of `from_size`
set.seed(123)
x <- sample(10, 1e7, replace = TRUE)
from <- sample(10)
to <- sample(10)
bench::mark(
  vec_recode_values(
    x,
    from = from,
    to = to
  ),
  to[match(x, from)],
  iterations = 10
)
#> # A tibble: 2 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                          <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recode_values(x, from = from, … 25.6ms 25.9ms      38.6     153MB    154. 
#> 2 to[match(x, from)]                  46.5ms 46.8ms      21.4     114MB     14.3

# - 10mil, few uniques in x, integer
# - fully matched by `from`
# - vector `to` of `from_size`, strings
set.seed(123)
x <- sample(10, 1e7, replace = TRUE)
from <- sample(10)
to <- sample(letters, 10, replace = TRUE)
bench::mark(
  vec_recode_values(
    x,
    from = from,
    to = to
  ),
  to[match(x, from)],
  iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                         <bch:t> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recode_values(x, from = from,…   125ms  129ms      7.45     229MB     7.45
#> 2 to[match(x, from)]                  73.6ms   78ms     12.7      153MB     6.37

# - 10mil, few uniques in x, doubles
# - fully matched by `from`
# - vector `to` of `from_size`
set.seed(123)
x <- sample(10, 1e7, replace = TRUE) + 0
from <- sample(10) + 0
to <- sample(10)
bench::mark(
  vec_recode_values(
    x,
    from = from,
    to = to
  ),
  to[match(x, from)],
  iterations = 10
)
#> # A tibble: 2 × 6
#>   expression                            min  median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                        <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recode_values(x, from = from…  44.8ms  44.9ms     22.2      153MB    51.7 
#> 2 to[match(x, from)]                138.3ms 139.3ms      7.06     153MB     7.06

# - 10mil, many uniques in x, integer
# - replacing 1 value with `from`
# - list `to` of `x_size`
# i.e. `replace_values(x, 1L ~ to)`
set.seed(123)
x <- sample(100000, 1e7, replace = TRUE)
from <- 1L
to <- list(sample(100000, 1e7, replace = TRUE))
bench::mark(
  vec_replace_values(
    x,
    from = from,
    to = to,
    to_as_list_of_vectors = TRUE
  ),
  {
    where <- x == from
    x[where] <- to[[1]][where]
    x
  },
  dplyr::case_match(x, from ~ to[[1]], .default = x),
  iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 3 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                          <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 vec_replace_values(x, from = from,… 27.8ms 31.2ms      29.8     124MB     20.9
#> 2 { where <- x == from x[where] <- t… 17.8ms 21.6ms      43.2     153MB     25.9
#> 3 dplyr::case_match(x, from ~ to[[1]… 75.1ms 77.2ms      12.2     346MB     25.7

# - 10mil, many uniques in x, integer
# - replacing 2 values with `from`
# - list `to` of `x_size`
set.seed(123)
x <- sample(100000, 1e7, replace = TRUE)
from <- c(1L, 2L)
to <- list(x)
bench::mark(
  vec_replace_values(
    x,
    from = from,
    to = to,
    to_as_list_of_vectors = TRUE
  ),
  dplyr::case_match(x, from ~ to[[1]], .default = x),
  iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                          <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 vec_replace_values(x, from = from,… 37.2ms 38.4ms      25.8     124MB     25.8
#> 2 dplyr::case_match(x, from ~ to[[1]… 82.7ms 85.6ms      10.9     343MB     21.8

# - 10mil, many uniques in x, integer
# - replacing 1 value with `from`
# - scalar `to`
set.seed(123)
x <- sample(100000, 1e7, replace = TRUE)
from <- 1L
to <- 0L
bench::mark(
  vec_replace_values(
    x,
    from = from,
    to = to
  ),
  replace(x, x == from, to),
  dplyr::case_match(x, from ~ to, .default = x),
  iterations = 10
)
#> # A tibble: 3 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                          <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 vec_replace_values(x, from = from,… 29.8ms 30.2ms      33.1     124MB    132. 
#> 2 replace(x, x == from, to)             17ms 17.3ms      57.3     114MB     47.7
#> 3 dplyr::case_match(x, from ~ to, .d… 70.8ms 70.8ms      14.1     343MB    184.

# - varying size, few uniques in x, integer
# - replacing 1 value with `from`
# - scalar `to`
set.seed(123)
from <- 1L
to <- 0L
bench::press(
  size = c(1e3, 1e4, 1e5, 1e6, 3e6, 8e6, 1e7, 2e7, 5e7),
  {
    x <- sample(10, size, replace = TRUE)
    bench::mark(
      vec_replace_values(
        x,
        from = from,
        to = to
      ),
      replace(x, x == from, to),
      dplyr::case_match(x, from ~ to, .default = x),
      iterations = 10
    )
  }
) |>
  print(n = Inf)
#> # A tibble: 27 × 14
#>    expression    size      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#>    <bch:expr>   <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#>  1 vec_replace…   1e3   6.03µs   6.97µs 114886.     12.89KB     0       10     0
#>  2 replace(x, …   1e3   3.28µs   3.63µs 246365.     12.27KB     0       10     0
#>  3 dplyr::case…   1e3  85.16µs  90.16µs  10101.     36.76KB     0       10     0
#>  4 vec_replace…   1e4  41.41µs  45.41µs  21707.    127.15KB     0       10     0
#>  5 replace(x, …   1e4  23.33µs  29.93µs  34127.    121.49KB     0       10     0
#>  6 dplyr::case…   1e4  192.5µs 211.56µs   4505.    353.15KB     0       10     0
#>  7 vec_replace…   1e5 407.87µs  472.2µs   2126.      1.24MB     0       10     0
#>  8 replace(x, …   1e5 252.19µs 259.41µs   3803.      1.18MB     0       10     0
#>  9 dplyr::case…   1e5   1.19ms   1.25ms    807.      3.44MB    89.7      9     1
#> 10 vec_replace…   1e6   4.69ms   4.81ms    209.      12.4MB     0       10     0
#> 11 replace(x, …   1e6   2.71ms   2.75ms    361.     11.83MB    40.1      9     1
#> 12 dplyr::case…   1e6   10.7ms  11.15ms     89.9    34.33MB     0       10     0
#> 13 vec_replace…   3e6  13.43ms  13.75ms     71.4    37.19MB     7.94     9     1
#> 14 replace(x, …   3e6   7.61ms   8.14ms    122.     35.48MB    30.5      8     2
#> 15 dplyr::case…   3e6  31.94ms  32.64ms     30.7      103MB    30.7      5     5
#> 16 vec_replace…   8e6  36.15ms  39.47ms     25.3    99.18MB    17.7     10     7
#> 17 replace(x, …   8e6  20.58ms  23.44ms     40.1     94.6MB    28.1     10     7
#> 18 dplyr::case…   8e6  86.41ms  88.81ms     10.9   274.66MB    21.7     10    20
#> 19 vec_replace…   1e7  44.48ms  47.07ms     20.3   123.98MB    18.2     10     9
#> 20 replace(x, …   1e7  25.03ms  29.18ms     33.6   118.25MB    13.4     10     4
#> 21 dplyr::case…   1e7 108.48ms 111.52ms      8.59  343.32MB    12.9     10    15
#> 22 vec_replace…   2e7  86.25ms  92.25ms     10.6   247.96MB     7.44    10     7
#> 23 replace(x, …   2e7  49.75ms  52.33ms     18.8   236.51MB     9.42    10     5
#> 24 dplyr::case…   2e7 208.97ms 215.14ms      4.59  686.65MB     7.34    10    16
#> 25 vec_replace…   5e7 217.58ms 229.86ms      4.39  619.89MB     3.08    10     7
#> 26 replace(x, …   5e7 123.45ms 133.42ms      7.56  591.27MB     3.78    10     5
#> 27 dplyr::case…   5e7  528.6ms 539.68ms      1.85    1.68GB     3.15    10    17
#> # ℹ 5 more variables: total_time <bch:tm>, result <list>, memory <list>,
#> #   time <list>, gc <list>

# - 10mil, many uniques in x, integer
# - replacing 10000 values with `from`
# - scalar `to`
set.seed(123)
x <- sample(100000, 1e7, replace = TRUE)
from <- sample(100000, 10000)
to <- NA_integer_
bench::mark(
  vec_replace_values(
    x,
    from = from,
    to = to
  ),
  replace(x, x %in% from, to),
  dplyr::case_match(x, from ~ to, .default = x),
  iterations = 10
)
#> # A tibble: 3 × 6
#>   expression                            min  median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                        <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl>
#> 1 vec_replace_values(x, from = fro…  71.9ms  76.1ms     13.2      124MB    1.46 
#> 2 replace(x, x %in% from, to)       215.9ms 216.8ms      4.61     195MB    0.513
#> 3 dplyr::case_match(x, from ~ to, … 130.4ms 131.1ms      7.62     343MB    3.27

# - 30mil, few uniques in x, integer
# - replacing groups of `from` values
# - vector `to`
set.seed(123)
x <- sample(10, 3e7, replace = TRUE)
from <- list(1:3, 4:5, 6:10)

bench::mark(
  vec_recode_values(
    x = x,
    from = from,
    to = c("x", "y", "z"),
    from_as_list_of_vectors = TRUE
  ),
  {
    out <- character(length(x))
    out[x %in% 1:3] <- "x"
    out[x %in% 4:5] <- "y"
    out[x %in% 6:10] <- "z"
    out
  },
  dplyr::case_match(x, 1:3 ~ "x", 4:5 ~ "y", 6:10 ~ "z"),
  iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 3 × 6
#>   expression                           min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                       <bch:t> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 "vec_recode_values(x = x, from … 353.7ms 362.66ms     2.69   686.65MB     1.08
#> 2 "{ out <- character(length(x)) …   1.09s    1.11s     0.898    1.68GB     1.17
#> 3 "dplyr::case_match(x, 1:3 ~ \"x…   1.34s    1.35s     0.732    2.01GB     1.32

# - varying size, few uniques in x, integer
# - replacing few values with `from`
# - list of vectors `to`
set.seed(123)
from <- c(2, 5, 3, 6, 9)
bench::press(
  size = c(1e3, 1e4, 1e5, 1e6, 3e6, 8e6, 1e7, 2e7, 5e7),
  {
    x <- sample(10, size, replace = TRUE)
    to <- rep(list(x), times = length(from))
    bench::mark(
      vec_recode_values(
        x,
        from = from,
        to = to,
        to_as_list_of_vectors = TRUE
      ),
      {
        out <- rep(NA_integer_, times = size)
        out[x == from[[1]]] <- to[[1]][x == from[[1]]]
        out[x == from[[2]]] <- to[[2]][x == from[[2]]]
        out[x == from[[3]]] <- to[[3]][x == from[[3]]]
        out[x == from[[4]]] <- to[[4]][x == from[[4]]]
        out[x == from[[5]]] <- to[[5]][x == from[[5]]]
        out
      },
      dplyr::case_match(
        x,
        from[[1]] ~ to[[1]],
        from[[2]] ~ to[[2]],
        from[[3]] ~ to[[3]],
        from[[4]] ~ to[[4]],
        from[[5]] ~ to[[5]],
      ),
      iterations = 10
    )
  }
) |>
  print(n = Inf)
#> # A tibble: 27 × 14
#>    expression    size      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#>    <bch:expr>   <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#>  1 vec_recode_…   1e3   9.14µs   9.59µs 84954.      14.27KB    0        10     0
#>  2 { out <- re…   1e3  25.83µs   28.5µs 35109.      89.52KB    0        10     0
#>  3 dplyr::case…   1e3 164.86µs 173.53µs  5593.     100.46KB    0        10     0
#>  4 vec_recode_…   1e4  90.73µs 102.15µs  9379.     137.41KB    0        10     0
#>  5 { out <- re…   1e4 221.15µs 251.49µs  3973.     880.81KB    0        10     0
#>  6 dplyr::case…   1e4 476.13µs 492.29µs  2014.     979.35KB    0        10     0
#>  7 vec_recode_…   1e5   1.18ms    1.2ms   833.       1.34MB    0        10     0
#>  8 { out <- re…   1e5   2.26ms    2.4ms   416.       8.59MB    0        10     0
#>  9 dplyr::case…   1e5   3.31ms   3.52ms   286.       9.54MB   31.7       9     1
#> 10 vec_recode_…   1e6  11.97ms  12.21ms    81.7     13.35MB    0        10     0
#> 11 { out <- re…   1e6  23.17ms  24.63ms    41.0     85.83MB    0        10     0
#> 12 dplyr::case…   1e6  31.51ms  33.17ms    30.4     95.37MB    3.37      9     1
#> 13 vec_recode_…   3e6  37.18ms  37.83ms    26.5     40.06MB    0        10     0
#> 14 { out <- re…   3e6  68.58ms  71.05ms    14.0     257.5MB    3.49      8     2
#> 15 dplyr::case…   3e6  98.86ms 103.28ms     9.66   286.11MB    2.41      8     2
#> 16 vec_recode_…   8e6  97.35ms  98.32ms    10.1    106.82MB    1.12      9     1
#> 17 { out <- re…   8e6  180.7ms 182.43ms     5.46   686.66MB    8.19      4     6
#> 18 dplyr::case…   8e6 252.55ms 252.59ms     3.96   762.94MB   15.8       2     8
#> 19 vec_recode_…   1e7 121.47ms 122.78ms     8.14   133.52MB    0.905     9     1
#> 20 { out <- re…   1e7 225.09ms 225.81ms     4.43   858.33MB    4.43      5     5
#> 21 dplyr::case…   1e7 311.92ms 314.37ms     3.19   953.68MB    8.49      3     8
#> 22 vec_recode_…   2e7 242.62ms 247.48ms     4.00   267.04MB    1.20     10     3
#> 23 { out <- re…   2e7  466.4ms 476.64ms     2.09     1.68GB    3.76     10    18
#> 24 dplyr::case…   2e7 641.05ms 671.58ms     1.48     1.86GB    3.71     10    25
#> 25 vec_recode_…   5e7 612.17ms 631.42ms     1.57   667.59MB    0.626    10     4
#> 26 { out <- re…   5e7    1.19s    1.22s     0.818    4.19GB    2.70     10    33
#> 27 dplyr::case…   5e7    1.67s    1.71s     0.584    4.66GB    2.10     10    36
#> # ℹ 5 more variables: total_time <bch:tm>, result <list>, memory <list>,
#> #   time <list>, gc <list>

# - varying size, few uniques in x, integer
# - remapping all values with `from`
# - vector `to`, strings
set.seed(123)
from <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
to <- as.character(from)
bench::press(
  size = c(1e3, 1e4, 1e5, 1e6, 3e6, 8e6, 1e7, 2e7, 5e7),
  {
    x <- sample(10, size, replace = TRUE)
    bench::mark(
      vec_recode_values(
        x,
        from = from,
        to = to
      ),
      dplyr::case_match(
        x,
        from[[1]] ~ to[[1]],
        from[[2]] ~ to[[2]],
        from[[3]] ~ to[[3]],
        from[[4]] ~ to[[4]],
        from[[5]] ~ to[[5]],
        from[[6]] ~ to[[6]],
        from[[7]] ~ to[[7]],
        from[[8]] ~ to[[8]],
        from[[9]] ~ to[[9]],
        from[[10]] ~ to[[10]]
      ),
      iterations = 10
    )
  }
) |>
  print(n = Inf)
#> # A tibble: 18 × 14
#>    expression    size      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#>    <bch:expr>   <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#>  1 vec_recode_…   1e3  15.01µs  15.64µs 56564.      23.81KB    0        10     0
#>  2 dplyr::case…   1e3  295.9µs 305.37µs  3120.     183.82KB    0        10     0
#>  3 vec_recode_…   1e4 116.85µs 121.03µs  8267.     234.75KB    0        10     0
#>  4 dplyr::case…   1e4 909.38µs 942.06µs  1057.       1.76MB  117.        9     1
#>  5 vec_recode_…   1e5   1.17ms   1.19ms   824.       2.29MB    0        10     0
#>  6 dplyr::case…   1e5   6.53ms   6.64ms   150.      17.55MB    0        10     0
#>  7 vec_recode_…   1e6  11.78ms  12.27ms    82.2     22.89MB    0        10     0
#>  8 dplyr::case…   1e6  65.77ms  67.98ms    14.6    175.48MB    0        10     0
#>  9 vec_recode_…   3e6  36.13ms   37.7ms    26.7     68.67MB    0        10     0
#> 10 dplyr::case…   3e6 192.11ms 204.65ms     4.92   526.43MB    1.23      8     2
#> 11 vec_recode_…   8e6  97.21ms  99.24ms    10.0    183.11MB    1.12      9     1
#> 12 dplyr::case…   8e6 506.34ms 506.34ms     1.97     1.37GB   21.7       1    11
#> 13 vec_recode_…   1e7 120.79ms 123.52ms     8.09   228.88MB    0.898     9     1
#> 14 dplyr::case…   1e7 661.43ms  667.1ms     1.50     1.71GB    4.49      3     9
#> 15 vec_recode_…   2e7  238.5ms 248.07ms     3.90   457.76MB    1.17     10     3
#> 16 dplyr::case…   2e7    1.32s    1.41s     0.709    3.43GB    2.06     10    29
#> 17 vec_recode_…   5e7 593.44ms 644.79ms     1.57     1.12GB    0.940    10     6
#> 18 dplyr::case…   5e7    3.39s     3.5s     0.287    8.57GB    1.49     10    52
#> # ℹ 5 more variables: total_time <bch:tm>, result <list>, memory <list>,
#> #   time <list>, gc <list>

# - varying size, few uniques in x, integer
# - remapping all values with `from`
# - vector `to`, doubles
#
# Wild amount of memory improvements here
set.seed(123)
from <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
to <- from
bench::press(
  size = c(1e3, 1e4, 1e5, 1e6, 3e6, 8e6, 1e7, 2e7, 5e7),
  {
    x <- sample(10, size, replace = TRUE)
    bench::mark(
      vec_recode_values(
        x,
        from = from,
        to = to
      ),
      dplyr::case_match(
        x,
        from[[1]] ~ to[[1]],
        from[[2]] ~ to[[2]],
        from[[3]] ~ to[[3]],
        from[[4]] ~ to[[4]],
        from[[5]] ~ to[[5]],
        from[[6]] ~ to[[6]],
        from[[7]] ~ to[[7]],
        from[[8]] ~ to[[8]],
        from[[9]] ~ to[[9]],
        from[[10]] ~ to[[10]]
      ),
      iterations = 10
    )
  }
)
#> # A tibble: 18 × 7
#>    expression                size      min   median `itr/sec` mem_alloc `gc/sec`
#>    <bch:expr>               <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#>  1 vec_recode_values(x, fr…   1e3   6.36µs   7.22µs   1.10e+5   23.81KB     0   
#>  2 dplyr::case_match(x, fr…   1e3 284.75µs 317.95µs   3.09e+3  183.82KB     0   
#>  3 vec_recode_values(x, fr…   1e4  39.28µs  41.55µs   2.34e+4  234.75KB     0   
#>  4 dplyr::case_match(x, fr…   1e4  811.6µs 864.73µs   1.15e+3    1.76MB     0   
#>  5 vec_recode_values(x, fr…   1e5 318.16µs 356.29µs   2.78e+3    2.29MB     0   
#>  6 dplyr::case_match(x, fr…   1e5   5.73ms   6.02ms   1.67e+2   17.55MB     0   
#>  7 vec_recode_values(x, fr…   1e6   3.44ms   3.65ms   2.72e+2   22.89MB     0   
#>  8 dplyr::case_match(x, fr…   1e6  56.43ms  58.46ms   1.72e+1  175.48MB     1.91
#>  9 vec_recode_values(x, fr…   3e6   8.82ms  14.22ms   7.95e+1   68.67MB     8.83
#> 10 dplyr::case_match(x, fr…   3e6 169.43ms  173.1ms   5.67e+0  526.43MB     2.43
#> 11 vec_recode_values(x, fr…   8e6  26.25ms  30.48ms   3.39e+1  183.11MB     3.77
#> 12 dplyr::case_match(x, fr…   8e6 445.26ms    469ms   2.14e+0    1.37GB     2.14
#> 13 vec_recode_values(x, fr…   1e7  34.81ms  40.03ms   2.52e+1  228.88MB     2.80
#> 14 dplyr::case_match(x, fr…   1e7 563.18ms 571.27ms   1.75e+0    1.71GB     7.88
#> 15 vec_recode_values(x, fr…   2e7  64.55ms  74.66ms   1.24e+1  457.76MB     4.98
#> 16 dplyr::case_match(x, fr…   2e7    1.13s    1.18s   8.45e-1    3.43GB     2.45
#> 17 vec_recode_values(x, fr…   5e7 162.75ms 181.62ms   5.19e+0    1.12GB     2.59
#> 18 dplyr::case_match(x, fr…   5e7    2.87s    2.93s   3.42e-1    8.57GB     2.02

# 10mil, few uniques in x, data frame
# remapping all values in `from`
# vector `to`
x <- data_frame(
  a = sample(10, 1e7, replace = TRUE),
  b = sample(10, 1e7, replace = TRUE)
)
from <- vec_expand_grid(
  a = 1:10,
  b = 1:10
)
to <- data_frame(
  c = 1L,
  d = 2L
)
bench::mark(
  vec_recode_values(x, from = from, to = to),
  dplyr::case_match(x, from ~ to)
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 × 6
#>   expression                                      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                                 <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recode_values(x, from = from, to = to)    109ms    110ms      7.80     153MB     3.90
#> 2 dplyr::case_match(x, from ~ to)               185ms    190ms      4.78     381MB     9.55

DavisVaughan force-pushed the feature/vec-recode-values branch 2 times, most recently from c9e1c14 to 57927eb Compare September 11, 2025 17:50

This was referenced Sep 11, 2025

Modernize vec_equal() and expose at C level #2028

Merged

Teach list_combine() how to handle compact_seq() as an index #2029

Merged

DavisVaughan force-pushed the feature/vec-recode-values branch 5 times, most recently from 7df8299 to 3f6016e Compare September 12, 2025 15:49

DavisVaughan marked this pull request as ready for review September 12, 2025 15:49

DavisVaughan mentioned this pull request Sep 12, 2025

Implement vec_case_when() and vec_replace_when() #2024

Open

DavisVaughan force-pushed the feature/vec-recode-values branch 3 times, most recently from 648a35b to 52068ab Compare September 12, 2025 18:55

Implement vec_recode_values() and vec_replace_values()

856c443

DavisVaughan force-pushed the feature/vec-recode-values branch from 52068ab to 856c443 Compare September 12, 2025 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement `vec_recode_values()` and `vec_replace_values()` #2027

Implement `vec_recode_values()` and `vec_replace_values()` #2027

Uh oh!

DavisVaughan commented Sep 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Implement vec_recode_values() and vec_replace_values() #2027

Are you sure you want to change the base?

Implement vec_recode_values() and vec_replace_values() #2027

Uh oh!

Conversation

DavisVaughan commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Implement `vec_recode_values()` and `vec_replace_values()` #2027

Implement `vec_recode_values()` and `vec_replace_values()` #2027

DavisVaughan commented Sep 11, 2025 •

edited

Loading