-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Dear @gdauby
I was trying to get the number of locations inside and outside protected areas, but I am having memory issues. Since the dataset is quite large (500 thousand records), the part where the pairwise distances are calculated is returning the following error:
pairwise_dist_not_pa <- stats::dist(coordEAC_not_pa[, 1:2], upper = F)
> Error: cannot allocate vector of size 582.8 Gb
If I got it right, this part of the code is attempting to get the 'Resolution' object that is passed to the argument size of ConR::.cell.occupied. But if the method = "fixed_grid", the resolution is provided by the user (argument Cell_size_locations). So, a condition could be included so that the pairwise distance is calculated only if Cell_size_locations is null (below just the chunk if (!is.null(protec.areas))):
if (nrow(coordEAC_not_pa) > 0) {
coordEAC_not_pa$tax <- as.character(coordEAC_not_pa$tax)
list_data_not_pa <- split(coordEAC_not_pa, f = coordEAC_not_pa$tax)
# if (nrow(coordEAC_pa) > 1)
# pairwise_dist_not_pa <- stats::dist(coordEAC_not_pa[,
# 1:2], upper = F)
if (any(method == "fixed_grid") & !is.null(Cell_size_locations)) {
Resolution <- Cell_size_locations
} else {
Resolution <- 10
}
if (any(method == "sliding scale")) {
if (nrow(coordEAC_pa) > 1) {
pairwise_dist_not_pa <- stats::dist(coordEAC_not_pa[,
1:2], upper = F)
Resolution <- max(pairwise_dist_not_pa) * Rel_cell_size
}
else {
Resolution <- 10
}
}
I use this adapted code and the function worked fine. It took 111 min (almost 2hs) with ~500,000, ~5100 spp, a PA sp.df of ~20MB and 6 cores.
Another option would be to include the call of the stats::dist inside the foreach loop, meaning that the definition of the Resolution would be carried for a much smaller subset and that it would be taxa-specific (not sure if this is a good or bad thing in this case).
In addition, it may be worthy to benchmark the function stats::dist against the functions fields::rdist and distances::distances, which should be faster to get the pairwise distances, although this is not the longest step of the function.
Best!