Skip to content

Memory allocation problems with function 'locations.comp' #20

@LimaRAF

Description

@LimaRAF

Dear @gdauby

I was trying to get the number of locations inside and outside protected areas, but I am having memory issues. Since the dataset is quite large (500 thousand records), the part where the pairwise distances are calculated is returning the following error:

pairwise_dist_not_pa <- stats::dist(coordEAC_not_pa[, 1:2], upper = F)
> Error: cannot allocate vector of size 582.8 Gb

If I got it right, this part of the code is attempting to get the 'Resolution' object that is passed to the argument size of ConR::.cell.occupied. But if the method = "fixed_grid", the resolution is provided by the user (argument Cell_size_locations). So, a condition could be included so that the pairwise distance is calculated only if Cell_size_locations is null (below just the chunk if (!is.null(protec.areas))):

    if (nrow(coordEAC_not_pa) > 0) {
      coordEAC_not_pa$tax <- as.character(coordEAC_not_pa$tax)
      list_data_not_pa <- split(coordEAC_not_pa, f = coordEAC_not_pa$tax)
      # if (nrow(coordEAC_pa) > 1) 
      #   pairwise_dist_not_pa <- stats::dist(coordEAC_not_pa[, 
      #                                                       1:2], upper = F)
      if (any(method == "fixed_grid") & !is.null(Cell_size_locations)) { 
        Resolution <- Cell_size_locations
      } else {
        Resolution <- 10
      }  
      if (any(method == "sliding scale")) {
        if (nrow(coordEAC_pa) > 1) {
          pairwise_dist_not_pa <- stats::dist(coordEAC_not_pa[, 
                                                              1:2], upper = F)
          Resolution <- max(pairwise_dist_not_pa) * Rel_cell_size
        }
        else {
          Resolution <- 10
        }
      }

I use this adapted code and the function worked fine. It took 111 min (almost 2hs) with ~500,000, ~5100 spp, a PA sp.df of ~20MB and 6 cores.

Another option would be to include the call of the stats::dist inside the foreach loop, meaning that the definition of the Resolution would be carried for a much smaller subset and that it would be taxa-specific (not sure if this is a good or bad thing in this case).

In addition, it may be worthy to benchmark the function stats::dist against the functions fields::rdist and distances::distances, which should be faster to get the pairwise distances, although this is not the longest step of the function.

Best!

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions