-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: utility to apply a function to each feature in a vector #1595
Comments
+1 from me to create a helper function for this! It reminds me a bit of But also, the |
My only concern with geopandas is memory usage, it doesn't seem to be designed for efficiency on arbitrarily large vectors. It's possible to read in a subset of a vector with the |
Thanks for this Emily, it's a +1 from me generally. I would also be concerned with Pandas performance and efficiencies too, as the backend of GeoPandas. Here's a page where they talk about scaling. But maybe it would be interesting to do a side by side comparison with a small, medium, and large vector use case. This does make me reflect on some conversations we've had about standardizing / building out a vector API in pygeoprocessing. We haven't given vectors the treatment that we've given rasters. I'm not saying we need to tackle that now, but am curious about whether something like this makes sense to live in PGP from the start? My only feature recommendation would be to provide an optional "copy" argument. I could see not wanting to edit the vector directly but instead make a copy. This could be a separate step beforehand, but might be a nice convenience feature too. |
On hold pending #1619 - this might seem less needed if we start using fiona |
It's a very common pattern that we iterate over each feature in a vector, do something with the feature's attributes and/or geometry, and write new attributes to the feature. A lot of the details of this process could be abstracted away with a wrapper function, something like
Which could be used like this (simple example from AWY):
Additional features could be
enumerated
, which if True, enables enumeration of the features. Ifenumerated
,op
would be called with(index, feature)
rather than just(feature)
. I saw a couple of cases where this would be useful.op
raises an errorI count several instances in invest where this pattern could simplify existing code, for example:
compute_water_yield_volume
compute_watershed_valuation
compute_rsupply_volume
calculate_uhi_result_vector
calculate_energy_savings
_aggregate_carbon_map
Benefits would be to reduce redundant code and to make sure we're consistently using the best patterns for working with GDAL vectors (opening and closing correctly, saving to disk, using exceptions: #638). While preserving memory efficiency, since features are processed one at a time.
This would be sort-of parallel to pygeoprocessing's raster utilities, which are much more developed than our vector utilities.
The text was updated successfully, but these errors were encountered: