-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Xarray sub-package #1013
base: main
Are you sure you want to change the base?
Add Xarray sub-package #1013
Conversation
md.router, | ||
prefix="/md", | ||
tags=["Multi Dimensional"], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: remove this before merging
titiler.xarray
should be seen as a plug-in to titiler and not as an application itself. We will add example on how to build application using the endpoint factory
if "x" not in da.dims and "y" not in da.dims: | ||
try: | ||
latitude_var_name = next( | ||
x for x in ["lat", "latitude", "LAT", "LATITUDE", "Lat"] if x in da.dims |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to support other variable name?
|
||
|
||
@dataclass(init=False) | ||
class CompatXarrayParams(DefaultDependency): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not directly used in titiler.xarray but could be in a Tiler that would want to support both GDAL/Xarray dataset
|
||
|
||
@define(kw_only=True) | ||
class TilerFactory(BaseTilerFactory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By sub-classing titiler.core.factory.TilerFactory we avoid re-writing code
|
||
# remove some attribute from init | ||
img_preview_dependency: Type[DefaultDependency] = field(init=False) | ||
add_preview: bool = field(init=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we remove those 2 attributes because we don't support /preview
endpoints
"aiohttp", | ||
"pandas", | ||
"httpx", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: update rio-tiler to >=7.1
return Response(content, media_type=media_type) | ||
|
||
# custom /statistics endpoints (remove /statistics - GET) | ||
def statistics(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
☝️ IMO having a full dataset /statistics in a bit dangerous (as for the /preview endpoints) which is why we support only geojson statistics
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your work here, @vincentsarago!
This is a very opinionated take, but I think titiler-xarray would be best off with two separate routes, each with its own set of optional dependencies. The first route would be zarr
, which would open Zarr and virtual Zarr datasets using xarray.open_zarr
. The second route would be md
, which would opening any dataset readable by xarray.open_dataset
.
The primary reason I think we should do this is that it would enable us to incentivize virtualizing datasets into zarr, which would lead to much faster tile generation. We could do this by:
- Having all query parameters in the zarr route only relevant for
open_zarr
, simplifying API usage. - Automatically detect virtual datasets, removing the need for the
reference
parameter. - Lightening the image size for titiler-xarray deployments only using zarr because other readers would not be installed (and eventually obstore and/or icechunk could be used instead of the fsspec dependencies)
This would also simplify non-zarr usage for the following reasons:
- Zarr specific parameters (e.g., group, consolidated) would not be included in endpoints in the
md
route - We could use xarray's automatic backend detection rather than writing our own in
titiler/xarray/io.py
I also think isolating Zarr usage would simplify the eventual support of the GeoZarr and multiscales specifications.
Thanks @maxrjones 🙏 I see what you're saying. The goal of having a single Reader was to handle all the non-COG dataset so splitting in to two separate reader/set of endpoints would not meat the goal.
We can absolutely use
If I follow your think, it seems we would need a What if we make the dependencies optional? I'm going to open a PR on top of this one to try some things |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! The concept of creating pyramids in a zarr store was new to me, then I googled around and found @maxrjones's notebook 😆.
It is great to have the io
methods standardized here so we can import them in titiler.cmr
and other applications.
else: | ||
da = da.isel(time=0) | ||
|
||
assert len(da.dims) in [2, 3], "titiler.xarray can only work with 2D or 3D dataset" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert len(da.dims) in [2, 3], "titiler.xarray can only work with 2D or 3D dataset" | |
if len(da.dims) in [2, 3]: | |
raise ValueError("titiler.xarray can only work with 2D or 3D dataset") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you want this to be if not
if crs == "epsg:4326" and (da.x > 180).any(): | ||
# Adjust the longitude coordinates to the -180 to 180 range | ||
da = da.assign_coords(x=(da.x + 180) % 360 - 180) | ||
|
||
# Sort the dataset by the updated longitude coordinates | ||
da = da.sortby(da.x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there are more CRS definitions we would want to apply this fix to. Maybe there is a way to tell if we want to adjust coordinates based on some other CRS properties besides an exact name match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rasterio doesn't have a CRS.is_geographic
method yet (rasterio/rasterio#3218) but once it's available we could check if the CRS is geographic and then run those fixes
Co-authored-by: Henry Rodman <[email protected]>
overtake #1007
ref: developmentseed/titiler-xarray#68
To Do