-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do we show "Nothing has been computed yet." #489
Comments
The reader can trust us here, or run the code themselves. The README should be concise. How to show where the time is spent without diluting the message? Perhaps we can add timing information in small print below each chunk? |
I think it's not an issue of trust, but more of understanding, of making it more obvious what this means. Is time or memory the most important aspect? I thought time wouldn't be so informative given duckplyr is so fast? |
Not a good example yet, but this is the kind of things I mean library(conflicted)
library(duckplyr)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> The duckplyr package is configured to fall back to dplyr when it encounters an
#> incompatibility. Fallback events can be collected and uploaded for analysis to
#> guide future development. By default, data will be collected but no data will
#> be uploaded.
#> ℹ Automatic fallback uploading is not controlled and therefore disabled, see
#> `?duckplyr::fallback()`.
#> ✔ Number of reports ready for upload: 13.
#> → Review with `duckplyr::fallback_review()`, upload with
#> `duckplyr::fallback_upload()`.
#> ℹ Configure automatic uploading with `duckplyr::fallback_config()`.
#> ✔ Overwriting dplyr methods with duckplyr methods.
#> ℹ Turn off with `duckplyr::methods_restore()`.
conflict_prefer("filter", "dplyr", quiet = TRUE)
bench::mark(out <-
flights_df() %>%
duckdb_tibble(.tether = TRUE) %>%
filter(!is.na(arr_delay), !is.na(dep_delay)) %>%
mutate(inflight_delay = arr_delay - dep_delay) %>%
summarize(
.by = c(year, month),
mean_inflight_delay = mean(inflight_delay),
median_inflight_delay = median(inflight_delay),
) %>%
filter(month <= 6))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch> <bch:> <dbl> <bch:byt> <dbl>
#> 1 out <- flights_df() %>% duckdb_tibb… 226ms 228ms 4.31 151MB 0
bench::mark(out$month)
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 out$month 1.23µs 1.59µs 510480. 0B 51.1 Created on 2025-01-24 with reprex v2.1.1 |
library(conflicted)
library(duckplyr)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> The duckplyr package is configured to fall back to dplyr when it encounters an
#> incompatibility. Fallback events can be collected and uploaded for analysis to
#> guide future development. By default, data will be collected but no data will
#> be uploaded.
#> ℹ Automatic fallback uploading is not controlled and therefore disabled, see
#> `?duckplyr::fallback()`.
#> ✔ Number of reports ready for upload: 13.
#> → Review with `duckplyr::fallback_review()`, upload with
#> `duckplyr::fallback_upload()`.
#> ℹ Configure automatic uploading with `duckplyr::fallback_config()`.
#> ✔ Overwriting dplyr methods with duckplyr methods.
#> ℹ Turn off with `duckplyr::methods_restore()`.
conflict_prefer("filter", "dplyr", quiet = TRUE)
bench::mark(out <-
flights_df() %>%
as_duckdb_tibble(tether = TRUE) %>%
filter(!is.na(arr_delay), !is.na(dep_delay)) %>%
mutate(inflight_delay = arr_delay - dep_delay) %>%
summarize(
.by = c(year, month),
mean_inflight_delay = mean(inflight_delay),
median_inflight_delay = median(inflight_delay),
) %>%
filter(month <= 6))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch> <bch:> <dbl> <bch:byt> <dbl>
#> 1 out <- flights_df() %>% as_duckdb_t… 230ms 232ms 4.25 151MB 0
bench::mark(out$month)
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 out$month 1.24µs 1.45µs 573875. 0B 0 Created on 2025-01-24 with reprex v2.1.1 |
library(conflicted)
library(duckplyr)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> The duckplyr package is configured to fall back to dplyr when it encounters an
#> incompatibility. Fallback events can be collected and uploaded for analysis to
#> guide future development. By default, data will be collected but no data will
#> be uploaded.
#> ℹ Automatic fallback uploading is not controlled and therefore disabled, see
#> `?duckplyr::fallback()`.
#> ✔ Number of reports ready for upload: 13.
#> → Review with `duckplyr::fallback_review()`, upload with
#> `duckplyr::fallback_upload()`.
#> ℹ Configure automatic uploading with `duckplyr::fallback_config()`.
#> ✔ Overwriting dplyr methods with duckplyr methods.
#> ℹ Turn off with `duckplyr::methods_restore()`.
conflict_prefer("filter", "dplyr", quiet = TRUE)
bench::mark(out <-
flights_df() %>%
as_duckdb_tibble(tether = TRUE) %>%
filter(!is.na(arr_delay), !is.na(dep_delay)) %>%
mutate(inflight_delay = arr_delay - dep_delay) %>%
summarize(
.by = c(year, month),
mean_inflight_delay = mean(inflight_delay),
median_inflight_delay = median(inflight_delay),
) %>%
filter(month <= 6))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch> <bch:> <dbl> <bch:byt> <dbl>
#> 1 out <- flights_df() %>% as_duckdb_t… 227ms 227ms 4.34 151MB 0
bench::mark(nrow(out))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 nrow(out) 3.1µs 3.46µs 258493. 0B 0 Created on 2025-01-24 with reprex v2.1.1 |
library(conflicted)
library(duckplyr)
conflict_prefer("filter", "dplyr", quiet = TRUE)
library("profvis")
profvis({out <-
flights_df() %>%
filter(!is.na(arr_delay), !is.na(dep_delay)) %>%
mutate(inflight_delay = arr_delay - dep_delay) %>%
summarize(
.by = c(year, month),
mean_inflight_delay = mean(inflight_delay),
median_inflight_delay = median(inflight_delay),
) %>%
filter(month <= 6)
nrow(out)
}) |
I guess including profvis' screenshots would be the only way, but maybe rather in a vignette as it takes up space. |
library(conflicted)
library(duckplyr)
conflict_prefer("filter", "dplyr", quiet = TRUE)
p <- profvis::profvis({
out <-
flights_df() %>%
filter(!is.na(arr_delay), !is.na(dep_delay)) %>%
mutate(inflight_delay = arr_delay - dep_delay) %>%
summarize(
.by = c(year, month),
mean_inflight_delay = mean(inflight_delay),
median_inflight_delay = median(inflight_delay),
) %>%
filter(month <= 6)
nrow(out)
}
)
temp_dir <- withr::local_tempdir()
file.create(temp_dir, "index.html")
htmlwidgets::saveWidget(p, file.path(temp_dir, "index.html"))
library("chromote")
screen_width <- 1920
screen_height <- 1080
b <- ChromoteSession$new(height = screen_height, width = screen_width)
s <- servr::httw(temp_dir)
b$Page$navigate(s$url, wait_ = FALSE)
b$screenshot("profvis.png", wait_ = FALSE)
magick::image_read("profvis.png")
|
Done now in the "funnel" vignette. |
In the README there's the sentence "Nothing has been computed yet.".
How do we "show" this? It'd be nice to really bring home the idea of laziness. Using benchmark to show at which step memory usage changes?
It's a small thing but would really help with documenting/explaining laziness.
The text was updated successfully, but these errors were encountered: