Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of every(), some(), and none() #1036

Open
mgirlich opened this issue Jan 4, 2023 · 1 comment · May be fixed by #1169
Open

Performance of every(), some(), and none() #1036

mgirlich opened this issue Jan 4, 2023 · 1 comment · May be fixed by #1169
Labels
feature a feature request or enhancement

Comments

@mgirlich
Copy link
Contributor

mgirlich commented Jan 4, 2023

I recently discovered every(), some() and none() and think they are great as I frequently used a pattern like any(map_lgl(...)). As @DavisVaughan pointed out in this PR they are unfortunately much slower than the map_lgl(...) pattern unless they exit early (benchmark copied from the other PR)

library(purrr)
library(vctrs)

x <- as.list(1:10000)

fn <- function(x) {
  vec_is(x) || is.null(x)
}
bench::mark(
  all(map_lgl(x, fn)),
  every(x, fn)
)
#> # A tibble: 2 × 6
#>   expression               min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 all(map_lgl(x, fn))   15.9ms   20.8ms      47.7    3.91MB     30.7
#> 2 every(x, fn)          42.6ms   43.1ms      23.2  103.95KB    104.

bench::mark(
  any(map_lgl(x, vec_is_list)),
  some(x, vec_is_list)
)
#> # A tibble: 2 × 6
#>   expression                        min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                   <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 any(map_lgl(x, vec_is_list))   4.63ms   5.55ms     180.    41.31KB     24.0
#> 2 some(x, vec_is_list)          26.32ms  27.23ms      36.4    8.02KB     57.2

It would be great to improve their performance so that they are as fast/faster than map_lgl(...).

@DavisVaughan DavisVaughan added the feature a feature request or enhancement label Jan 4, 2023
@ErdaradunGaztea
Copy link

I've done some testing and I believe a significant chunk of overhead comes from two checks performed inside the loop. One is is_false()/is_true() called to check for early return, the other is is_bool() called to validate predicate's output. Both are necessary, so the only possible action is to try and reduce their overhead.

I've checked several options and the quickest one seems to be replacing both with a .Call() to a C function. Still slower than map_lgl() that doesn't have to check for early return, but an improvement nonetheless.

I'll try to write a C implementation of every() and friends anyways, to see if I can make it even closer; nevertheless, I meant to ask if the current improvement would be enough to warrant creating a PR.

library(purrr)

x <- as.list(1:10000)
fn <- function(x) {
  vctrs::vec_is(x) || is.null(x)
}

bench::mark(
  all(map_lgl(x, fn)),
  every(x, fn),
  min_time = 1
)
#> # A tibble: 2 × 6
#>   expression               min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 all(map_lgl(x, fn))   16.1ms   16.8ms      59.3    4.15MB     44.5
#> 2 every(x, fn)          21.2ms   21.8ms      44.7  133.76KB     59.6

Created on 2025-02-05 with reprex v2.1.1

#> 3 every(x, fn)*         29.7ms   30.8ms      31.9  135.74KB    166.

*) I've manually added this entry to add a comparison to the current every() implementation.

@ErdaradunGaztea ErdaradunGaztea linked a pull request Feb 6, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants