-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xarray and rioxarray compatibility #50
Comments
I separated the xarray version conversation into #51, since I think we can close that quickly, whereas rioxarray compatibility deserves more discussion. tl;dr: we can and should integrate with rioxarray right now, but I'd prefer to see the functionality you actually want from rioxarray split into a separate, IO-agnostic library, and a simple metadata convention tying them together. I doubt stackstac will ever actually use rioxarray internally for IO, since they function at very different levels and have different goals. rioxarray aims to produce a DataArray/Dataset just from a GeoTIFF, etc. file, so the metadata comes from opening the file itself. stackstac's whole goal is to never open the files themselves, and get all metadata from STAC. So I think rioxarray compatibility for stackstac would simply mean following rioxarray's CRS data model (a We can certainly do this, though personally I think it's a bit cumbersome for users. I suspect that most people (at least STAC users) would prefer to not care about the details of CF conventions. Plus, the STAC Projection extension and CF models represent similar information, but are structured differently, and it would be nice if neither had to impose its structure on users of the other. As I mentioned in the meeting a couple months ago (pangeo-data/cog-best-practices#4 (comment)), I agree that it's important to be able to read and write data that follows CF conventions. However, I don't know that users actually want to work with CRSs according to CF conventions in xarray. If the CRS was represented more simply, yet still could be read from and written to a netCDF file without information loss, would users be happy? I think the takeaway of the meeting a couple months ago was:
Basically, I think that as a common interchange interface between libraries, the rioxarray CRS model is more complex and netCDF-specific than we'd ideally want. I don't know though if it's necessary to capture the full range of complexity of CRSs—I'd just love if something simpler could work. Eventually, I think the answer here will be custom indexes in xarray (pydata/xarray#4979 pydata/xarray#1603 cc @dcherian). Another thing is that rioxarray isn't as focused on dask for some operations (corteva/rioxarray#119). Since stackstac is entirely dask-oriented, I'm hesitant to encourage users towards a library that won't work as well on their data. I wrote All that said though—for now, we might as well use a But perhaps this issue can be a place to start discussing that common CRS data model, and where we'd want operations on it to live? |
Regarding indexers, also see xarray-contrib/xoak#14 where the xoak devs and I discussed pyresample, geoxarray, xoak, and xarray's new base Indexer class. Putting this kind of stuff in geoxarray (or a library like it) is what I had hoped it would become. When I first created the repository I had all these ideas of what it could be and how it would work, then I learned that a lot of the stuff I was hoping to depend on (persistence in xarray accessors and/or attributes) just isn't how things work. I basically stopped trying to develop for geoxarray and waited and watched to see how this community developed what it needed before I tried to come up with a solution for something that wasn't being asked for. I think based on this discussion and others that have come up more and more that we're at a point where we (xoak, xarray, rioxarray, pyresample/satpy, stackstac, etc) are asking for a solution like geoxarray.
I agree that most people don't want to be limited by CF conventions, especially when they become cumbersome in situations like grid_mapping attributes, etc.
I think this is the key question. I think @snowman2 and the rioxarray project have figured out a good chunk of the "gotchas" involved with how this information can/should be represented in xarray objects. I personally wish CF conventions didn't dictate what in-memory/xarray representation looked like. I think a library (perhaps geoxarray) is needed that defines the "preferred" way to represent these properties in xarray objects, provides conversions to and from the various common schemes (CF, lon/lat coordinate arrays, etc), and likely provides wrappers or convenience functions around other functionality related to CRS-aware usage (resampling, indexing, etc). Maybe we need to define the best case interface we can think of for these types of usages? For example, assume xarray is completely geospatial focused, knows everything about coordinate references systems, and doesn't necessarily need backwards compatibility, how does someone working with geotiffs or stac geo data use it?
|
This is very similar to how opendatacube works, thought they don't currently use STAC. Someone on the call a while back mentioned they have thought about using STAC. Might be worth connecting with them.
I think this would be a great path forward as it would bring consistency for the CRS storage mechanism across non-xarray communities as well, including the GDAL stack (rasterio/QGIS/etc..). Had a similar discussion in opendatacube a while back about this ref.
Here are some things to consider about this:
This is something you could handle with a method/function where the users don't have to worry about anything similar to
Nice! Might be worth collaborating with the opendatacube group on this as well as they have some code that uses dask/rasterio to reproject ref.
I would definitely like to see dask support improved in rioxarray. Here are some things to consider:
|
@snowman2 is there anything in rioxarray's method for storing and handling of CRS information that you would change if backwards compatibility didn't matter? If I were to port this functionality to geoxarray, where would you suggest I start looking in rioxarray? Side question: does anyone know if entrypoints (like xarray now uses for engines) has performance penalties as the environment grows? I know using pkg_resources can be slow the more packages are installed, but not sure on entrypoints. |
Can't think of anything at the moment I would change. I would recommend looking at the |
ODC and stac https://github.com/opendatacube/datacube-stac-example So they are working on things - DEA has a STAC catalogue, etc. |
stac_items = satsearch.Search(
url="https://explorer.sandbox.dea.ga.gov.au/stac/",
bbox=bbox,
collections=["s1_gamma0_geotif_scene"],
datetime="2000-02-01/2021-03-31",
).items() |
I had posted an issue around some of this discussion on an internal Development Seed repo but I realized that it might be worthwhile to share. I copied it over to this gist and got some great comments and thoughts from @djhoese and @geospatial-jeff around their development plans for the future that might be relevant. Not sure if this thread is the best spot to continue the discussions around this? I know @scottyhq had a more general thread on this topic but I seem to have misplaced it. |
xarray 0.18 and rioxarray 0.4 were just released http://xarray.pydata.org/en/stable/whats-new.html#v0-18-0-6-may-2021 , and they included some important changes with backend configuration. Specifically, you can now load xarray datasets via rioxarray.open_rasterio() like so:
I actually haven't looked closely at the stackstac code yet to know how the dataset opening logic currently compares in these libraries... but it would be great to add some docs on integrating with rioxarray. One nice use-case is saving a small netcdf timeseries subset for using locally in QGIS.
Also, I'm wondering if it's possible to relax (^0) or bump (^0.18.0) the current xarray pin for the next release? Could add some matrix tests against xarray master to ensure things don't break since currently it does seem like minor bumps in xarray are more like major bumps ;) (#26 ).
stackstac/pyproject.toml
Line 43 in d1e15ca
The text was updated successfully, but these errors were encountered: