Best Practices for DataTree with Multiscale Data #9577

allen-adastra · 2024-10-04T04:15:50Z

allen-adastra
Oct 4, 2024

Mostly pinging @TomNicholas, I see DataTree is making it into the public API; exciting!

I was hoping to get your thoughts on best practices, with a concrete fusion example.

Let's consider magnetic data. We have that at three resolutions:

Order 10Hz from pre-shot pulse-design
Order 1kHz from real-time equilibrium reconstruction
Order 10kHz for the raw magnetic data

We also have many shots of data. Shots tend to be between 0.5 seconds and 3 seconds, which isn't a huge range, but a large enough range that we probably shouldn't just nan pad the ends and have "shot" just be a dimension along an array.

What kind of structure would you recommend with DataTree? Should we just have one node for each "shot"? Is there some built-in functionality to "transpose" a variable? (i.e. go from multiple arrays of "signalX" to a single array of "signalX" with a new dimension and nan padding to handle different lengths).

TomNicholas · 2024-10-06T16:51:26Z

TomNicholas
Oct 6, 2024
Maintainer

Order 10Hz from pre-shot pulse-design
Order 1kHz from real-time equilibrium reconstruction
Order 10kHz for the raw magnetic data

It's very satisfying to me to see people taking these same data management concepts and applying them to fusion 😊

probably shouldn't just nan pad the ends and have "shot" just be a dimension along an array.

Yeah that's hopefully a pre-datatree anti-pattern now.

What kind of structure would you recommend with DataTree?

Instead of answering directly yet I'm going to use you as a guinea pig and point you to some upcoming documentation I wrote - PR #9501, the latest build is here. I'm hoping reading that helps answers your question - if it doesn't then please say so.

Is there some built-in functionality to "transpose" a variable? (i.e. go from multiple arrays of "signalX" to a single array of "signalX" with a new dimension and nan padding to handle different lengths).

Nothing dedicated like that exists for datatree yet, but it could exist. In fact that sounds very similar to calling xr.concat with join='outer' & fill_value=np.Nan? (Also this is an example of #9349)

3 replies

allen-adastra Oct 6, 2024
Author

could

True, we can do this. Something I've noticed is that I can't have time coordinates be:

time (shot, time)

for each shot of xr.Dataset.

However, if I instead have time be a data variable and rename the time dimension to time_slice with index coordinates that is just np.arange, then I can do the concatenation. Wondering if this is the best practice?

allen-adastra Oct 6, 2024
Author

Docs look good, I'll try to get around to playing around with it; when is the release that contains xr.DataTree going to be out? (I can ofc install HEAD of main, but for my more productiony stuff would be good to have a proper release).

TomNicholas Oct 12, 2024
Maintainer

I can't have time coordinates be:

Not quite sure what you mean - a more explicit example would be helpful.

when is the release that contains xr.DataTree going to be out?

aiming for the 15th (Tuesday!)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best Practices for DataTree with Multiscale Data #9577

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Best Practices for DataTree with Multiscale Data #9577

allen-adastra Oct 4, 2024

Replies: 1 comment · 3 replies

TomNicholas Oct 6, 2024 Maintainer

allen-adastra Oct 6, 2024 Author

allen-adastra Oct 6, 2024 Author

TomNicholas Oct 12, 2024 Maintainer

allen-adastra
Oct 4, 2024

Replies: 1 comment 3 replies

TomNicholas
Oct 6, 2024
Maintainer

allen-adastra Oct 6, 2024
Author

allen-adastra Oct 6, 2024
Author

TomNicholas Oct 12, 2024
Maintainer