Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do we show "Nothing has been computed yet." #489

Open
maelle opened this issue Jan 24, 2025 · 11 comments
Open

How do we show "Nothing has been computed yet." #489

maelle opened this issue Jan 24, 2025 · 11 comments

Comments

@maelle
Copy link
Collaborator

maelle commented Jan 24, 2025

In the README there's the sentence "Nothing has been computed yet.".

How do we "show" this? It'd be nice to really bring home the idea of laziness. Using benchmark to show at which step memory usage changes?

It's a small thing but would really help with documenting/explaining laziness.

@krlmlr
Copy link
Member

krlmlr commented Jan 24, 2025

The reader can trust us here, or run the code themselves. The README should be concise. How to show where the time is spent without diluting the message?

Perhaps we can add timing information in small print below each chunk?

@maelle
Copy link
Collaborator Author

maelle commented Jan 24, 2025

I think it's not an issue of trust, but more of understanding, of making it more obvious what this means.

Is time or memory the most important aspect? I thought time wouldn't be so informative given duckplyr is so fast?

@maelle
Copy link
Collaborator Author

maelle commented Jan 24, 2025

Not a good example yet, but this is the kind of things I mean

library(conflicted)
library(duckplyr)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> The duckplyr package is configured to fall back to dplyr when it encounters an
#> incompatibility. Fallback events can be collected and uploaded for analysis to
#> guide future development. By default, data will be collected but no data will
#> be uploaded.
#> ℹ Automatic fallback uploading is not controlled and therefore disabled, see
#>   `?duckplyr::fallback()`.
#> ✔ Number of reports ready for upload: 13.
#> → Review with `duckplyr::fallback_review()`, upload with
#>   `duckplyr::fallback_upload()`.
#> ℹ Configure automatic uploading with `duckplyr::fallback_config()`.
#> ✔ Overwriting dplyr methods with duckplyr methods.
#> ℹ Turn off with `duckplyr::methods_restore()`.
conflict_prefer("filter", "dplyr", quiet = TRUE)
bench::mark(out <-
  flights_df() %>%
  duckdb_tibble(.tether = TRUE) %>%
  filter(!is.na(arr_delay), !is.na(dep_delay)) %>%
  mutate(inflight_delay = arr_delay - dep_delay) %>%
  summarize(
    .by = c(year, month),
    mean_inflight_delay = mean(inflight_delay),
    median_inflight_delay = median(inflight_delay),
  ) %>%
  filter(month <= 6))
#> # A tibble: 1 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                           <bch> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 out <- flights_df() %>% duckdb_tibb… 226ms  228ms      4.31     151MB        0
bench::mark(out$month)
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 out$month    1.23µs   1.59µs   510480.        0B     51.1

Created on 2025-01-24 with reprex v2.1.1

@maelle
Copy link
Collaborator Author

maelle commented Jan 24, 2025

library(conflicted)
library(duckplyr)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> The duckplyr package is configured to fall back to dplyr when it encounters an
#> incompatibility. Fallback events can be collected and uploaded for analysis to
#> guide future development. By default, data will be collected but no data will
#> be uploaded.
#> ℹ Automatic fallback uploading is not controlled and therefore disabled, see
#>   `?duckplyr::fallback()`.
#> ✔ Number of reports ready for upload: 13.
#> → Review with `duckplyr::fallback_review()`, upload with
#>   `duckplyr::fallback_upload()`.
#> ℹ Configure automatic uploading with `duckplyr::fallback_config()`.
#> ✔ Overwriting dplyr methods with duckplyr methods.
#> ℹ Turn off with `duckplyr::methods_restore()`.
conflict_prefer("filter", "dplyr", quiet = TRUE)
bench::mark(out <-
  flights_df() %>%
  as_duckdb_tibble(tether = TRUE) %>%
  filter(!is.na(arr_delay), !is.na(dep_delay)) %>%
  mutate(inflight_delay = arr_delay - dep_delay) %>%
  summarize(
    .by = c(year, month),
    mean_inflight_delay = mean(inflight_delay),
    median_inflight_delay = median(inflight_delay),
  ) %>%
  filter(month <= 6))
#> # A tibble: 1 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                           <bch> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 out <- flights_df() %>% as_duckdb_t… 230ms  232ms      4.25     151MB        0
bench::mark(out$month)
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 out$month    1.24µs   1.45µs   573875.        0B        0

Created on 2025-01-24 with reprex v2.1.1

@maelle
Copy link
Collaborator Author

maelle commented Jan 24, 2025

library(conflicted)
library(duckplyr)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> The duckplyr package is configured to fall back to dplyr when it encounters an
#> incompatibility. Fallback events can be collected and uploaded for analysis to
#> guide future development. By default, data will be collected but no data will
#> be uploaded.
#> ℹ Automatic fallback uploading is not controlled and therefore disabled, see
#>   `?duckplyr::fallback()`.
#> ✔ Number of reports ready for upload: 13.
#> → Review with `duckplyr::fallback_review()`, upload with
#>   `duckplyr::fallback_upload()`.
#> ℹ Configure automatic uploading with `duckplyr::fallback_config()`.
#> ✔ Overwriting dplyr methods with duckplyr methods.
#> ℹ Turn off with `duckplyr::methods_restore()`.
conflict_prefer("filter", "dplyr", quiet = TRUE)
bench::mark(out <-
  flights_df() %>%
  as_duckdb_tibble(tether = TRUE) %>%
  filter(!is.na(arr_delay), !is.na(dep_delay)) %>%
  mutate(inflight_delay = arr_delay - dep_delay) %>%
  summarize(
    .by = c(year, month),
    mean_inflight_delay = mean(inflight_delay),
    median_inflight_delay = median(inflight_delay),
  ) %>%
  filter(month <= 6))
#> # A tibble: 1 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                           <bch> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 out <- flights_df() %>% as_duckdb_t… 227ms  227ms      4.34     151MB        0
bench::mark(nrow(out))
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 nrow(out)     3.1µs   3.46µs   258493.        0B        0

Created on 2025-01-24 with reprex v2.1.1

@maelle
Copy link
Collaborator Author

maelle commented Jan 24, 2025

library(conflicted)
library(duckplyr)
conflict_prefer("filter", "dplyr", quiet = TRUE)
library("profvis")
profvis({out <-
  flights_df() %>%
  filter(!is.na(arr_delay), !is.na(dep_delay)) %>%
  mutate(inflight_delay = arr_delay - dep_delay) %>%
  summarize(
    .by = c(year, month),
    mean_inflight_delay = mean(inflight_delay),
    median_inflight_delay = median(inflight_delay),
  ) %>%
  filter(month <= 6)
nrow(out)
})

Image

@maelle
Copy link
Collaborator Author

maelle commented Jan 24, 2025

Edit: this below is probably not by order of execution.

Image

@maelle
Copy link
Collaborator Author

maelle commented Jan 24, 2025

When sees the caching when adding a second nrow(out). Now I'm not sure how to best add profvis illustrations in R Markdown 🤪 (the output I got from the profmem package was not clear).

Image

@maelle
Copy link
Collaborator Author

maelle commented Jan 24, 2025

I guess including profvis' screenshots would be the only way, but maybe rather in a vignette as it takes up space.

@maelle
Copy link
Collaborator Author

maelle commented Jan 24, 2025

library(conflicted)
library(duckplyr)
conflict_prefer("filter", "dplyr", quiet = TRUE)
p <- profvis::profvis({
  out <-
    flights_df() %>%
    filter(!is.na(arr_delay), !is.na(dep_delay)) %>%
    mutate(inflight_delay = arr_delay - dep_delay) %>%
    summarize(
      .by = c(year, month),
      mean_inflight_delay = mean(inflight_delay),
      median_inflight_delay = median(inflight_delay),
    ) %>%
    filter(month <= 6)
  nrow(out)
}
)
temp_dir <- withr::local_tempdir()
file.create(temp_dir, "index.html")
htmlwidgets::saveWidget(p, file.path(temp_dir, "index.html"))
library("chromote")
screen_width <- 1920
screen_height <- 1080
b <- ChromoteSession$new(height = screen_height, width = screen_width)
s <- servr::httw(temp_dir)
b$Page$navigate(s$url, wait_ = FALSE)
b$screenshot("profvis.png", wait_ = FALSE)
magick::image_read("profvis.png")

@krlmlr
Copy link
Member

krlmlr commented Jan 26, 2025

Done now in the "funnel" vignette.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants