Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will there ever be a convergence of NetCDF.jl and NCDatasets? #180

Open
alex-s-gardner opened this issue Jul 14, 2023 · 6 comments
Open

Comments

@alex-s-gardner
Copy link

In my limited understanding is seems like NetCDF.jl has a more robust backend with DiskArrays while NCDatasets has a more friendly CommonDataModel.jl syntax. It would be fantastic if the two projects could join forces before.

@alex-s-gardner
Copy link
Author

alex-s-gardner commented Jul 14, 2023

Maybe NetCDF.jl is intended to be more full featured?

@visr
Copy link
Member

visr commented Jul 14, 2023

There is this previous thread on NCDatasets.jl: JuliaGeo/NCDatasets.jl#57

Regarding DiskArrays support for NCDatasets, that has been a long time coming but JuliaGeo/NCDatasets.jl#79 seems to be actively worked on in JuliaGeo/NCDatasets.jl#205.

Maybe NetCDF.jl is intended to be more full featured?

What makes you think so? NCDatasets.jl is being more actively developed, and has built in support for CF conventions.

I'm not sure how feasible it is, since it will be a bunch of work, but it would be nice if NCDatasets.jl could depend on NetCDF.jl for using the netCDF C API, and then add CommonDataModel.jl, CF conventions and other useful things on top. Then NetCDF.jl would be domain agnostic and could perhaps live in JuliaIO, with NCDatasets possibly moving to JuliaGeo.

But I'm just sketching out a possible path, and not committing to the work haha. Likely they will continue to live side by side for a while.

@visr
Copy link
Member

visr commented Jul 14, 2023

cc @Alexander-Barth

@alex-s-gardner
Copy link
Author

@visr thanks for the links, looks like this is outstanding issue in search of a resolution.

@rafaqz
Copy link
Member

rafaqz commented Jul 30, 2023

@alex-s-gardner I'll add a bit from my perspective. NCDatasets.jl implements a bunch of things not available in NetCDF.jl, like bounds variable handling, datetime, cf standards etc. Thats why Rasters.jl uses it instead of NetCDF currently.

But DiskArrayTools.jl has a CF standards implementation that can wrap any DiskArray that I think should become the standard.

CommonDataModel.jl is a nice idea but not based on DiskArrays.jl and its turned out to be a struggle to get it to be.

One major stumbling block, for years now with NCDatasets.jl, is that @Alexander-Barth needs setindex! to be able to grow an array along a dimension. This breaks the architecture of DiskArrays.jl and Rasters.jl in a fairly fundamental way - we cant grow chunk sizes or dimension lookups in a setindex! call because these are immutable objects. It also breaks pretty strongly with the Base julia AbstractArray interface in a way we cant dispatch on to work around.

My idea to use a grow! method instead (where you explicitly need to use the returned object for a DiskArray or Raster) as an API has been rejected from NCDatasets.jl, so we are pretty much at an impasse.

@tcarion has been putting a lit of work into GRIB/netcdf working with DiskArrays.jl: JuliaGeo/CommonDataModel.jl#9 and JuliaGeo/NCDatasets.jl#205 and rafaqz/Rasters.jl#416 for Rasters.jl

But we just hits the setindex! problem and have to implement CF disk array in Rasters.jl anyway because CommonDataModel wont provide that.

In the end what we need the most is both DiskArrays.jl and CF standards for grib and netcdf files, and nothing currently provides that except DiskArrayTools and the Rasters.jl PR. But neither of these are really long term solutions.

To me, adding comprehensive DiskArrays.jl support to CommonDataModel.jl (e.g. also for views and everything else) and taking care of GRIBDatasets.jl at the same time is the obvious solution to the feature mismatch. But its not clear that that will happen.

CommonDataModel.jl needs to commit to using DiskArrays.jl - or not - for us to really proceed further with this part of the ecosystem.

If it does, NCDatasets.jl will have all the functionality NetCDF.jl has. If it doesn't, then probably we need to add DiskArrayTools.jl dependency here for CFDiskArray, and fill out the functionality missing from NCDatasets.jl. I would swap out the backend in Rasters.jl to NetCDF.jl.

@alex-s-gardner
Copy link
Author

@rafaqz I totally agree with your sentiment. It would be great if we could get some clarity so that we can chart a path forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants