Downloading data: child process has died #323

peterdesmet · 2024-09-30T16:12:00Z

I got the following error when trying to download the largest dataset I know:

> download_acoustic_dataset(animal_project_code = "2013_albertkanaal")
Downloading data to directory `2013_albertkanaal`:
* (1/6): downloading animals.csv
* (2/6): downloading tags.csv                                                                   
* (3/6): downloading detections.csv                                                             
Error: child process has died

In call:
tryCatch({
    if (length(priority)) 
        setpriority(priority)
    if (length(rlimits)) 
        set_rlimits(rlimits)
    if (length(gid)) 
        setgid(gid)
    if (length(uid)) 
        setuid(uid)
    if (length(profile)) 
        aa_change_profile(profile)
    if (length(device)) 
        options(device = device)
    graphics.off()
    options(menu.graphics = FALSE)
    serialize(withVisible(eval(orig_expr, parent.frame())), NULL)
}, error = function(e) {
    old_class <- attr(e, "class")
    structure(e, class = c(old_class, "eval_fork_error"))
}, finally = substitute(graphics.off()))

This type of time-outs is expected when using the API. Is there an option to catch they and suggestion something more helpful?

PietrH · 2024-10-01T07:06:11Z

I get exactly the same error, at the same stage. I suspect the failure is actually at:

get_acoustic_detections(animal_project_code = "2013_albertkanaal")

I'm getting HTTP 502 request failed returns on the above call, this might be fixed with paging in https://github.com/inbo/etn/tree/paging:

etn/R/utils.R

Lines 99 to 159 in 8012306

 fetch_result_paged <- 

 function(connection, 

 query, 

 page_size = 1000, 

 progress = FALSE 

 ){ 

 assertthat::assert_that(assertthat::is.count(page_size)) 

 # Stop a progress bar from appearing if not required 

 if(!progress) { 

 withr::local_options(cli.progress_show_after = Inf) 

 } 

 # Create result object to page into = Execute query on DB 

 result <- DBI::dbSendQuery(connection, query, immediate = FALSE) 

 # When this function exits, clear the result (Mandatory) 

 withr::defer(DBI::dbClearResult(result)) 

 # Fetch some information about our result object 

 result_colnames <- DBI::dbColumnInfo(result)$name 

 result_nrow <- DBI::dbGetInfo(result)$rows.affected 

 # Create tempfile to write to, automatically delete when function completes 

 partial_result_file <- withr::local_tempfile() 

 # Initialize a progress bar 

 # pb <- progress::progress_bar$new( 

 # total = result_nrow, 

 # format = " fetching [:bar] :percent in :elapsed", 

 # width = 60 

 # ) 

 # withr::defer(pb$terminate()) 

 cli::cli_progress_bar("Fetching result from ETN", total = result_nrow) 

 ## set object to keep track of howmany rows have been fetched 

 rows_done <- 0 

 # Fetch pages of the result until we have everything 

 while(!DBI::dbHasCompleted(result)){ 

 readr::write_csv(DBI::dbFetch(result, n = page_size), 

 partial_result_file, 

 append = TRUE, 

 progress = FALSE) 

 rows_done <- rows_done + page_size 

 # rows_done <- DBI::dbGetInfo(result)$row.count 

 # pb$update(rows_done/result_nrow) 

 cli::cli_progress_update(set = 

 # length(readr::read_lines(partial_result_file, progress = FALSE)) 

 rows_done 

 ) 

 } 

 # Read the temp file we wrote the result data.frame to 

 result_df <- 

 readr::read_csv( 

 partial_result_file, 

 col_names = result_colnames, 

 show_col_types = FALSE, 

 progress = FALSE 

 ) 

 return(result_df) 

 }

Paging comes at a significant cost, not only the IO operations, but having to either rely on the parsing of readr, or having to store the mapping somewhere and reapplying it. It would like to avoid having to COUNT the size of a return object before paging or not paging, and I think leaving the choice up to the user is not so friendly either.

I'm thinking about it. In any case, this might have to be fixed on the etnservice side.

etn::get_acoustic_detections(animal_project_code = "2013_albertkanaal", api = FALSE) does work

PietrH · 2024-10-01T07:23:06Z

Because it's a gateway error, I've contacted Stijn to see what he can see on his side.

I don't think the object is too big to pass over the API, especially compressed. I don't think server side paging will fix this, but client side paging might, altough with a very significant overhead (because we'd need to implement sorting, or maybe use R sessions to fetch from OpenCPU..)

PietrH · 2024-10-01T14:18:24Z

502 errors are due to Nginx (opencpu-cache), forwarded information to Stijn. We'll need a look into the admin logs for more info.

Stijn-VLIZ · 2024-10-04T11:33:09Z

I tried many different things, but my conclusion is that we are running into limits here.
This image shows the memory usage of a local docker, running etnservice.
The function get_acoustic_detections, is altered so no ordering is being done and the dataframe is emptied before being serialized. So the memory is for doing the query only.

This also runs for 9 minutes.
What I propose is indeed pagination, I will first investigate the possibilities on our side (db).

PietrH · 2024-10-04T11:37:39Z

If this is the case, why does the query work when using a local database connection? get_acoustic_detections(animal_project_code = "2013_albertkanaal", api = FALSE)

Stijn-VLIZ · 2024-10-04T12:49:11Z

That's a question on how OpenCPU works.
So opencpu starts a new R session on the server and then runs the ent package.
Also It creates a session of it's own where opencpu stores information about your request, and your result.
How and why it impacts the memory so hard, I don't now.

Their might be a solution in a async worker doing the query and writing it to file and than returning that.
In that case you can use the async endpoint and check when the data is ready.

PietrH · 2024-10-08T09:51:07Z

I'm not sure if OpenCPU supports async requests. I agree that async requests would be the best solution for big datasets.

Currently, if I make changes I'm directly working on a live environment that has some (beta) users. Especially towards the future, how can I experiment with fixes without effecting the live api that people are using?
Is it possible the request succeeds on a local database connection simply because the RStudio Server has much more memory? In this case, optimizing the query might be the answer.

peterdesmet added this to the v2.3 milestone Sep 30, 2024

PietrH removed this from the v2.3 milestone Oct 1, 2024

PietrH added the API label Oct 1, 2024

PietrH added this to the v2.3.1 milestone Oct 1, 2024

PietrH mentioned this issue Oct 8, 2024

Better error message for HTTP502 OpenCPU failure #325

Open

PietrH mentioned this issue Oct 16, 2024

API: change compressed output format for get_val to reduce serialisation cost #327

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downloading data: child process has died #323

Downloading data: child process has died #323

peterdesmet commented Sep 30, 2024

PietrH commented Oct 1, 2024 •

edited

Loading

PietrH commented Oct 1, 2024

PietrH commented Oct 1, 2024

Stijn-VLIZ commented Oct 4, 2024

PietrH commented Oct 4, 2024 •

edited

Loading

Stijn-VLIZ commented Oct 4, 2024

PietrH commented Oct 8, 2024

Downloading data: child process has died #323

Downloading data: child process has died #323

Comments

peterdesmet commented Sep 30, 2024

PietrH commented Oct 1, 2024 • edited Loading

PietrH commented Oct 1, 2024

PietrH commented Oct 1, 2024

Stijn-VLIZ commented Oct 4, 2024

PietrH commented Oct 4, 2024 • edited Loading

Stijn-VLIZ commented Oct 4, 2024

PietrH commented Oct 8, 2024

PietrH commented Oct 1, 2024 •

edited

Loading

PietrH commented Oct 4, 2024 •

edited

Loading