You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's a very common pattern that we iterate over each feature in a vector, do something with the feature's attributes and/or geometry, and write new attributes to the feature. A lot of the details of this process could be abstracted away with a wrapper function, something like
Which could be used like this (simple example from AWY):
defcompute_water_yield_volume(watershed_results_vector_path):
vol_name='wyield_vol'defwyield_vol_op(feat):
wyield_mn=feat.GetField('wyield_mn')
# there won't be a wyield_mn value if the polygon feature only# covers nodata raster values, so check before doing math.ifwyield_mnisnotNone:
geom=feat.GetGeometryRef()
# Calculate water yield volume,# 1000 is for converting the mm of wyield to metersvol=wyield_mn*geom.Area() /1000# Get the volume field index and add valuefeat.SetField(vol_name, vol)
utils.vector_apply(
watershed_results_vector_path, wyield_vol_op,
new_fields=[(vol_name, ogr.OFTReal)])
Additional features could be
a boolean arg enumerated, which if True, enables enumeration of the features. If enumerated, op would be called with (index, feature) rather than just (feature). I saw a couple of cases where this would be useful.
Error handling to make sure that progress is saved and written to disk if op raises an error
Timed progress logging
Handle setting shapefile field width and precision for different dtypes
I count several instances in invest where this pattern could simplify existing code, for example:
AWY compute_water_yield_volume
AWY compute_watershed_valuation
AWY compute_rsupply_volume
UCM calculate_uhi_result_vector
UCM calculate_energy_savings
Forest carbon _aggregate_carbon_map
Benefits would be to reduce redundant code and to make sure we're consistently using the best patterns for working with GDAL vectors (opening and closing correctly, saving to disk, using exceptions: #638). While preserving memory efficiency, since features are processed one at a time.
This would be sort-of parallel to pygeoprocessing's raster utilities, which are much more developed than our vector utilities.
The text was updated successfully, but these errors were encountered:
+1 from me to create a helper function for this! It reminds me a bit of pygeoprocessing.geoprocessing.shapely_geometry_to_vector (at least the attribute portion of it), but with a focus on setting of attributes.
But also, the apply terminology makes me think of pandas/geopandas. At this point, especially since we are using conda for our package management, should we consider moving vector operations like this to geopandas?
My only concern with geopandas is memory usage, it doesn't seem to be designed for efficiency on arbitrarily large vectors. It's possible to read in a subset of a vector with the rows option to geopandas.read_file, but I'm not aware of any way to work with a whole geodataframe without reading it all into memory
Thanks for this Emily, it's a +1 from me generally. I would also be concerned with Pandas performance and efficiencies too, as the backend of GeoPandas. Here's a page where they talk about scaling. But maybe it would be interesting to do a side by side comparison with a small, medium, and large vector use case.
This does make me reflect on some conversations we've had about standardizing / building out a vector API in pygeoprocessing. We haven't given vectors the treatment that we've given rasters. I'm not saying we need to tackle that now, but am curious about whether something like this makes sense to live in PGP from the start?
My only feature recommendation would be to provide an optional "copy" argument. I could see not wanting to edit the vector directly but instead make a copy. This could be a separate step beforehand, but might be a nice convenience feature too.
It's a very common pattern that we iterate over each feature in a vector, do something with the feature's attributes and/or geometry, and write new attributes to the feature. A lot of the details of this process could be abstracted away with a wrapper function, something like
Which could be used like this (simple example from AWY):
Additional features could be
enumerated
, which if True, enables enumeration of the features. Ifenumerated
,op
would be called with(index, feature)
rather than just(feature)
. I saw a couple of cases where this would be useful.op
raises an errorI count several instances in invest where this pattern could simplify existing code, for example:
compute_water_yield_volume
compute_watershed_valuation
compute_rsupply_volume
calculate_uhi_result_vector
calculate_energy_savings
_aggregate_carbon_map
Benefits would be to reduce redundant code and to make sure we're consistently using the best patterns for working with GDAL vectors (opening and closing correctly, saving to disk, using exceptions: #638). While preserving memory efficiency, since features are processed one at a time.
This would be sort-of parallel to pygeoprocessing's raster utilities, which are much more developed than our vector utilities.
The text was updated successfully, but these errors were encountered: