Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Xarray sub-package #1013

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

vincentsarago
Copy link
Member

@vincentsarago vincentsarago commented Oct 29, 2024

overtake #1007

ref: developmentseed/titiler-xarray#68

To Do

  • reviews
  • add Docs

md.router,
prefix="/md",
tags=["Multi Dimensional"],
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: remove this before merging

titiler.xarray should be seen as a plug-in to titiler and not as an application itself. We will add example on how to build application using the endpoint factory

if "x" not in da.dims and "y" not in da.dims:
try:
latitude_var_name = next(
x for x in ["lat", "latitude", "LAT", "LATITUDE", "Lat"] if x in da.dims
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to support other variable name?



@dataclass(init=False)
class CompatXarrayParams(DefaultDependency):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not directly used in titiler.xarray but could be in a Tiler that would want to support both GDAL/Xarray dataset



@define(kw_only=True)
class TilerFactory(BaseTilerFactory):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By sub-classing titiler.core.factory.TilerFactory we avoid re-writing code


# remove some attribute from init
img_preview_dependency: Type[DefaultDependency] = field(init=False)
add_preview: bool = field(init=False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we remove those 2 attributes because we don't support /preview endpoints

"aiohttp",
"pandas",
"httpx",
]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: update rio-tiler to >=7.1

return Response(content, media_type=media_type)

# custom /statistics endpoints (remove /statistics - GET)
def statistics(self):
Copy link
Member Author

@vincentsarago vincentsarago Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☝️ IMO having a full dataset /statistics in a bit dangerous (as for the /preview endpoints) which is why we support only geojson statistics

@vincentsarago
Copy link
Member Author

you can try this with

uvicorn titiler.xarray.main:app --port 8080 --reload
Screenshot 2024-10-29 at 10 28 02 PM

@vincentsarago

This comment was marked as resolved.

@vincentsarago vincentsarago marked this pull request as ready for review October 30, 2024 15:05
Copy link
Member

@maxrjones maxrjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your work here, @vincentsarago!

This is a very opinionated take, but I think titiler-xarray would be best off with two separate routes, each with its own set of optional dependencies. The first route would be zarr, which would open Zarr and virtual Zarr datasets using xarray.open_zarr. The second route would be md, which would opening any dataset readable by xarray.open_dataset.

The primary reason I think we should do this is that it would enable us to incentivize virtualizing datasets into zarr, which would lead to much faster tile generation. We could do this by:

  1. Having all query parameters in the zarr route only relevant for open_zarr, simplifying API usage.
  2. Automatically detect virtual datasets, removing the need for the reference parameter.
  3. Lightening the image size for titiler-xarray deployments only using zarr because other readers would not be installed (and eventually obstore and/or icechunk could be used instead of the fsspec dependencies)

This would also simplify non-zarr usage for the following reasons:

  1. Zarr specific parameters (e.g., group, consolidated) would not be included in endpoints in the md route
  2. We could use xarray's automatic backend detection rather than writing our own in titiler/xarray/io.py

I also think isolating Zarr usage would simplify the eventual support of the GeoZarr and multiscales specifications.

@vincentsarago
Copy link
Member Author

Thanks @maxrjones 🙏

I see what you're saying. The goal of having a single Reader was to handle all the non-COG dataset so splitting in to two separate reader/set of endpoints would not meat the goal.

This is a very opinionated take, but I think titiler-xarray would be best off with two separate routes, each with its own set of optional dependencies. The first route would be zarr, which would open Zarr and virtual Zarr datasets using xarray.open_zarr. The second route would be md, which would opening any dataset readable by xarray.open_dataset.

We can absolutely use xarray.open_zarr instead of xarray.open_dataset here when reading a zarr

We could use xarray's automatic backend detection rather than writing our own in titiler/xarray/io.py

How so? https://github.com/developmentseed/titiler/pull/1013/files#diff-dd6fab5d1e55a1d860ff8bd2190f145f2574100f734d382cee56c48bd7a7f1f5R43-R49 ?

If I follow your think, it seems we would need a titiler.multidim and a titiler.zarr packages 🤷

What if we make the dependencies optional? I'm going to open a PR on top of this one to try some things

Copy link
Contributor

@hrodmn hrodmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! The concept of creating pyramids in a zarr store was new to me, then I googled around and found @maxrjones's notebook 😆.

It is great to have the io methods standardized here so we can import them in titiler.cmr and other applications.

src/titiler/xarray/tests/test_factory.py Show resolved Hide resolved
else:
da = da.isel(time=0)

assert len(da.dims) in [2, 3], "titiler.xarray can only work with 2D or 3D dataset"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert len(da.dims) in [2, 3], "titiler.xarray can only work with 2D or 3D dataset"
if len(da.dims) in [2, 3]:
raise ValueError("titiler.xarray can only work with 2D or 3D dataset")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you want this to be if not

Comment on lines +211 to +216
if crs == "epsg:4326" and (da.x > 180).any():
# Adjust the longitude coordinates to the -180 to 180 range
da = da.assign_coords(x=(da.x + 180) % 360 - 180)

# Sort the dataset by the updated longitude coordinates
da = da.sortby(da.x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there are more CRS definitions we would want to apply this fix to. Maybe there is a way to tell if we want to adjust coordinates based on some other CRS properties besides an exact name match.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rasterio doesn't have a CRS.is_geographic method yet (rasterio/rasterio#3218) but once it's available we could check if the CRS is geographic and then run those fixes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants