-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack / Mosaic STAC item collections #65
Comments
Hi @rmg55 , thanks for sharing your ideas the detailed example!
There are certainly different ways to go about this, and I'm sure of an optimal approach. I have often just used calls to
does seem likely. but the work-around if it isn't there is open the asset file (metadata only) and store the CRS
definitely I think this will be a common use case, but i think the same functionality could be easily implemented in intake-xarray, which does the actual loading and mapping to an xarray dataset. I'm thinking having the option to use rioxarray would be great intake/intake-xarray#87. See also https://gis.stackexchange.com/questions/376685/issue-while-performing-merge-mosaic-of-multiple-tiff-files-using-rioxarray-pytho/ That said, maybe an |
Thanks for comments/suggestions @scottyhq! A few follow-up questions/thoughts:
Agreed! However, I think there are a few advantages of including a custom version of this for intake-stac.
hmmm, I am not sure I 100% follow this. I agree that adding RioXarray as an option in intake-xarray would be great. However, I have not found an efficient method to reproject/mosaicing xarray objects. I have used I would envision a set of functions to build vrt files like: For a single band: For stacking bands (typically temporally) with no spatial mosaicing (like the gist I linked above): For stacking bands and spatially mosaicing: where Thoughts? |
@rmg55 I had a bit more time to think about this and look at your example. I think this would be great to implement perhaps in two separate PRs! I think the primary goal of intake-stac should be to give you a DataArray (or DataSet) with minimal alteration to how the data is stored. to keep the scope narrow, reprojection, mosaicing, and other operations are available via the xarray api (or rioxarray api). That said, the workflow you're going for is really common and it'd be nice to have some built-in functions to make it easier! So how about:
|
@scottyhq, really appreciate the explanations and guidance on this!
Just to clarify, this would be for an entire catalog, correct? Each vrt band would correspond to a stac item in the catalog. This seems very doable. My only question/confusion would be for catalogs that span multiple tiles (e.g. there would be different spatial extents / grids) - would that be a valid .vrt file or xarray object? Perhaps we should check the bounding boxes on the items and throw an error if there are more than one bbox (or output mutliple files with _X suffixes - items_1.vrt, items_2.vrt, etc...)?
I agree with your conclusion here - I really like the "path_as_pattern" approach you linked to in the gist (thanks for sharing that btw!). A more convenient function would be great to include. Do you know if the STAC spec insures that the catalogs be subdivided into "date" folders? I think the one case that I am still unsure of how to accomplish using xarray/rioxarray + dask arrays + intake-stac would be the reproject + spatially mosaic + temporally stack workflow. Perhaps there is a workflow I am not familiar with that can efficiently accomplish this... This is the specific workflow I was levitating towards for the STAC --> vrt approach. Something along the lines of:
Where mosaic_by is optional (only needed if spatially mosaicing), but generally set to metadata field that represents the COG's date. Perhaps this is something to put on the back-burner and revisit at a later date if there seems to be need/desire within the community? |
Good point. definitely a complication is that a STAC catalog could contain items with assets that are all over the map and in different projections, so that suggestion applies to collections where the CRS is the same (a given S2 tile and L8 path row) being good examples.
Another good point! This is not enforced, so while that path-as-pattern approach is nice I don't think it can be relied on, instead we'd have to iterate through the catalog for paths. Looking into that here #66.
Let's keep experimenting with some simple examples (1 MRGS tile set --> web mercator tiles, adjacent MGRS grids into single xarray dataset?). Hopefully we can document some ways to do this even if it doesn't end up being a single function in this library. |
Yes, e.g. I was looking at SA yesterday, and hence definitely more than one utm projection there |
However, median is not yet implemented on dask arrays |
and DS['blue'].quantile(0.5,dim='time') which I am not sure about - or how good the quantile would be either - maybe ok if a mosaic of thousands for a whole state? |
Hi @RichardScottOZ and @bluetyson Not sure I am fully following your comments, but will try to add my input (perhaps this is related to your posts on the Pangeo Discourse page?). Sounds like you want to temporal stack and mosaic a bunch of images via .vrt files. Can you expand on what you mean by "median mosaic". If you are referring to the spatial mosaicing in a vrt file, you should be able to do this with a numpy function written into the vrt file (see here - https://gis.stackexchange.com/questions/324602/averaging-overlapping-rasters-with-gdal). I should note that I have very little experience composing python functions in vrt files, so I don't really know all the tips/tricks yet. If you develop a good workflow - it would be great if you wanted to share it in this thread! @scottyhq - still planning on working on this at some point, but just have not been able to carve out any time yet. |
Median mosaic - for example, get all the sentinel 2A images over an area for a time period, take the median for each pixel as a representative composite. xarray or vrt - I don't mind, but having options would be good |
@RichardScottOZ - ok got it. My thinking would be to rechunk to the time-intervals, and then apply a map_blocks function that calculates the median. |
Thanks very much Rowan, do you have an example of such? |
For those following this, but not pangeo-data/cog-best-practices#4 Here's the repo: https://github.com/stac-utils/stac-vrt and docs: https://stac-vrt.readthedocs.io/en/latest. Pretty rough right now (likely only works for NAIP STAC items / COGs in Azure), but happy to hack on it with people. |
This STAC blog (specifically the section under Additional Flexibility - proj:shape and proj:transform) gave me the idea that gdal vrt with vrt mosaicking could be leveraged in intake-STAC to stack and mosaic intake-stac item collections.
Here is a gist showing an example of temporally stacking a single band of Sentinel-2a COGS using this approach: https://gist.github.com/rmg55/1acf804ef1af0c7934b265b3a653a486
A similar approach could also be implemented to mosaic data as well (although I have yet to try it). To apply this feature, the STAC projection extension would need to be implemented in the STAC catalog. I could not find any existing work by the Open Data Cube folks on this effort (they were referenced in the blog post mentioned above).
A few questions:
Tagging @scottyhq and @jhamman
The text was updated successfully, but these errors were encountered: