Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test XArray in combination with cattrs and dask #62

Merged
merged 7 commits into from
Jan 3, 2025

Conversation

deltamarnix
Copy link
Contributor

Added some simple test prototypes that show how we could keep a low memory profile with chunks, and how to unstructure data that contains XArray objects.

@wpbonelli
Copy link
Member

so we don't have to worry about numpy/xarray arrays going through cattrs to the de/serializer? they will not be duplicated by default?

@deltamarnix
Copy link
Contributor Author

so we don't have to worry about numpy/xarray arrays going through cattrs to the de/serializer? they will not be duplicated by default?

We will need to create unstructure hooks so that we just return the original object, already on the DataTree level. I did a test that I didn't save, where I saw that I received a different DataTree id. I will add it so it's clear what is also not possible in the current setup.

This is the simplest way to keep DataTree object the original.
Otherwise it would return a copy of some sort.
@wpbonelli wpbonelli marked this pull request as ready for review December 13, 2024 14:56
Copy link
Member

@wpbonelli wpbonelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess as we chatted about last week we have to decide whether to fully adopt xarray for array/list data variables, or keep data variables as top level attrs and expose xarray views of them.. if the former we will send xarray objects thru cattrs, but in the latter case we would not, is that right?

Are there any more questions we need to ask and try to answer (with research and/or prototyping) to inform the decision? I'll think more on this.

I guess this question is not quite so pressing since you established that cattrs will not be necessary/advisable for mf6 format io, where we can send objects directly to the serializer, and may just be useful for cases like dict-based initialization and/or dict-like serialization formats like json/yaml

@deltamarnix deltamarnix merged commit 489cda9 into modflowpy:develop Jan 3, 2025
11 checks passed
@deltamarnix deltamarnix deleted the xarray-unstructure branch January 3, 2025 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants