-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding progress bar display func #951
base: main
Are you sure you want to change the base?
Conversation
Not printing, looking for further insight. @krlmlr? Any ideas. |
Thanks for working on it! I think this should be an R option with a callback that is called when the option is set, see |
Or, perhaps even a slot in the |
Well, I'm almost done with the callback. So I'll test that first. |
0e03670
to
b3b600e
Compare
Oh, DUCKDB_DISABLE_PRINT flag probably does not help |
All right, it works, now ironing out the bugs. |
library(duckdb)
library(cli)
progress <- function(x) {
if (cli::cli_progress_num() == 0) {
cli::cli_progress_bar("Duckdb SQL", total = 100, .envir = .GlobalEnv)
}
cli::cli_progress_update(set = x, .envir = .GlobalEnv)
if (x > 100) {
cli::cli_progress_done(.envir = .GlobalEnv)
}
}
options("duckdb.progress_display" = progress)
conn <- duckdb::dbConnect(duckdb::duckdb())
duckdb::dbSendQuery(conn, "SET progress_bar_time = 0;")
q <- "CREATE OR REPLACE TABLE BOB AS (
SELECT * FROM 'ldbc-sf300-comments-creationDate.parquet')"
duckdb::dbSendQuery(conn, q) |
#ifndef DUCKDB_DISABLE_PRINT seems redundant since it is already used in printer.cpp and it prevents from using a display set via config.create_display_func when compiled with flag -DDUCKDB_DISABLE_PRINT, like the duckdb-r package, where I'm trying to implement a display. https://github.com/duckdb/duckdb/blob/main/src/common/printer.cpp duckdb/duckdb-r#951 PrintProgress -> TerminalProgressBarDisplay::Update -> TerminalProgressBarDisplay::PrintProgressInternal -> Printer::RawPrint and there is a macro there. Plus there is already a config option to enable_progress_bar and default is FALSE. So. Can it be remove? cc: @krlmlr
I'm done on this one. Let me know if this works for you. |
Testing with library(spanishoddata)
library(duckdb)
library(tidyverse)
x_dates <- c("2022-01-01", "2022-01-02", "2022-01-03", "2022-01-04")
x <- spod_get(type = "od", zones = "distr", dates = x_dates)
dbGetQuery(x$src$con, "SELECT current_setting('enable_progress_bar');")
dbSendQuery(x$src$con, "SET enable_progress_bar = true;")
dbGetQuery(x$src$con, "SELECT current_setting('enable_progress_bar');")
progress <- function(x) {
if (cli::cli_progress_num() == 0) {
cli::cli_progress_bar("Duckdb SQL", total = 100, .envir = .GlobalEnv)
}
cli::cli_progress_update(set = x, .envir = .GlobalEnv)
if (x > 100) {
cli::cli_progress_done(.envir = .GlobalEnv)
}
}
options("duckdb.progress_display" = progress)
duckdb::dbSendQuery(x$src$con, "SET progress_bar_time = 0;")
xx <- x |> group_by(id_origin, date, activity_origin) |> summarise(mean_trips = mean(n_trips)) |> collect() And it works! @meztez do we have to manually define the progress function though...? what is the final idea of this PR? I would expect that progress bar just 'magically' appears as soon as we do: dbGetQuery(x$src$con, "SELECT current_setting('enable_progress_bar');") p.s. in my case |
It could provide a dummy default. It's just a function(x) called with progress percentage from within duckdb-r. I'm not the package maintainer and I just needed it for a deliverable, so whatever works is fine by me. |
Thanks for the PR! Looking at the implementation, I think the callback function should be a slot in the connection object. There could be basic reporting (opt-out, in interactive mode only) in the duckdb R package, and more sophisticated progress in duckplyr. |
@meztez totally makes sense. Thanks for the work in the internals to make this possible! Looking forward for this to be merged! |
In the above examples, (x > 100) indicates that the processing is complete. Shouldn't that be (x >= 100)? I think it's more common to consider 100% to indicate "done" than "still processing". |
progress <- function(x) {
if (x < 100 && cli::cli_progress_num() == 0) {
cli::cli_progress_bar("Duckdb SQL", total = 100, .envir = .GlobalEnv, )
}
cli::cli_progress_update(set = x, .envir = .GlobalEnv)
}
options("duckdb.progress_display" = progress) |
I have done some more testing here: rOpenSpain/spanishoddata#124 (comment). To summarize, it seems like at the moment the progress bar behavior is dependent on the data size and if you are filtering to any particular part of the data. That is, if you have 100GB of data, and your query is running on the data that is stored somewhere in the beginning of the file (I used duckdb file format), then you will get some progress from 1% to 3%, and then it will just jump to 100%. Similarly, if you filter to the data that is somewhere in the end of the database file, it will jump to 70% or 90% from the very beginning of the query. So at the moment the progress bar implementation is not very informative and useful. The question is if this is an upstream problem (and normal behavior for DuckDB), or if this is an artifact of how the progress bar reporting was implemented in the R package in this PR. @meztez @krlmlr do you have any insights if what I'm describing is expected behavior for the progress bar, and if not if this can be fixed? |
@e-kotov Try the same thing with duckdb cli and see if you get the same behavior. I really do not see anything from the R package side that would make this behavior any different. Also this quick search in duckdb/duckdb issues : duckdb/duckdb#12454 |
So far tried with Python module, and got similar behaviour.
I saw this one, but they said this particular one was fixed. There were a few more similar issues in the upstream repo, but nothing of the sort that I described. I will report this upstream. |
@krlmlr |
@hannes: Can we guarantee that the progress update functions are always called from the "main" thread (the one that initiated the execution)? |
It does look like the calls are always coming from the same thread ID, but better to be sure here. |
For #199.