Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Support plate loading for varying well sizes (ref #240) #241

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

jluethi
Copy link

@jluethi jluethi commented Jan 19, 2023

Addresses #240, see there for a description of the issue

This PR adds a second code path to the get_pyramid_lazy function. Its goal is to load the metadata of each well to be able to check the pyramid sizes. It then handles varying well sizes by padding every well to the max well size.

I kept the old code path in for the moment and added a loading parameter to the get_pyramid_lazy where the user can specify "new" or "default" to make it easier to compare the two workflows. It's set to "new" at the moment, so this PR follows the new behavior by default for testing.

What this PR should achieve:

  1. Handle wells with different xy dimensions
  2. Handle wells with different Z levels
  3. Adds some padding between wells

Handles, but not optimal:

  1. It handles wells with varying channel numbers. That is not really an intended behavior and may be a bit confusing, because it does not know which channel is which and just pads them together if the channel numbers vary. That was a side-effect of writing the axis handling very general, i.e. not making assumptions on the number of dimensions our images would have (besides x & y being the last 2).

Downsides

The initial loading of a plate gets much slower with this PR. It introduces an overhead of ~1s per well in my setup. As such, if the assumptions hold that all wells have the same dimensions, this PR is a downgrade and thus should not be the default way to load plates.
2 thoughts on this:

  1. Can we provide alternate loading modes and expose them to the user somehow e.g. in napari? Any idea how we'd do this?
  2. This use case would argue for an (optional) consolidation of metadata at the plate level to allow for faster loading, see also discussions here: consolidate Plate to preview image ngff#141 If that were possible, the performance hit may become negligible. It should be easy to refactor this PR to load from a different place, as all the well spec loading takes place in get_plate_well_specs, which is only called once.
    What this part would require is access to consolidate info about the image pyramid shapes.

@codecov
Copy link

codecov bot commented Jan 19, 2023

Codecov Report

Base: 84.46% // Head: 82.94% // Decreases project coverage by -1.52% ⚠️

Coverage data is based on head (20c35e6) compared to base (db0f5dc).
Patch coverage: 72.72% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #241      +/-   ##
==========================================
- Coverage   84.46%   82.94%   -1.52%     
==========================================
  Files          13       13              
  Lines        1474     1548      +74     
==========================================
+ Hits         1245     1284      +39     
- Misses        229      264      +35     
Impacted Files Coverage Δ
ome_zarr/reader.py 80.61% <72.72%> (-5.44%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@will-moore
Copy link
Member

Thanks for the PR: not having looked at the code yet, but one thought on how to choose the optimal loading mode...

You could imagine an optional flag in the plate.zarr/.zattrs to indicate "well_dimensions_not_equal": true, and if you find this is true then you use the new loading.
This wouldn't be a breaking change, since the current assumption is that all well dimensions are equal.
cc @sbesson @chris-allan @melissalinkert

@jluethi
Copy link
Author

jluethi commented Jan 23, 2023

Thanks, great idea @will-moore on the "well_dimensions_not_equal": true attribute. If the overall approach is something you'd consider in that way, happy to implement that and clean things up a bit more.

Any idea why I would have tests failing on python 3.8 and 3.9 for parts that are not related to code changes I've made? Do you also see this in other places or should I dig through those tests to understand if there is any dependency (i.e. them using the plate writer somewhere?)

@sbesson
Copy link
Member

sbesson commented Jan 23, 2023

From my side, there is definitely an argument for consolidating some metadata at the top-level especially for complex structures like plates. This echoes what was done for the wells field for instance as a lot of this data could be theoretically recomputed on-the-fly at the expense of several attribute reads.

You could imagine an optional flag in the plate.zarr/.zattrs to indicate "well_dimensions_not_equal": true, and if you find this is true then you use the new loading.

My primary issue is that this name is highly unspecific. Which dimensions are we talking about: all of the dimensions of each multiscale image? XY(Z) only? the channels? the number of fields of view per well?

@jluethi
Copy link
Author

jluethi commented Jan 23, 2023

From my side, there is definitely an argument for consolidating some metadata at the top-level especially for complex structures like plates. This echoes what was done for the wells field for instance as a lot of this data could be theoretically recomputed on-the-fly at the expense of several attribute reads.

Good point. What I currently see is that if we can consolidate the pyramid shape informations on the plate level for each well, then I would not need to loop through all the wells and this implementation should be about as fast as the original one, just more general (i.e. handles the case when any of the dimensions aren't equal by padding).

My primary issue is that this name is highly unspecific. Which dimensions are we talking about: all of the dimensions of each multiscale image? XY(Z) only? the channels? the number of fields of view per well?

Fair point. I'd see this as transitionary until we have the pyramid shape metadata at the plate level and thus this could become default loading behavior. I actually implement the checks for all the dimensions of an image (tczyx). If any well dimension is potentially unequal, one needs to load this metadata beforehand to figure it out. The expensive part is loading the additional metadata, then one can address any dimension that varies with padding. As such, it is actually a somewhat general solution to this.

It doesn't address loading multi-field of view, as this also isn't a part of the current implementation for other reasons (and we switched to using single images for all field of views combined, see discussion here: ome/ngff#137).

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/napari-napari-ome-zarr-plugin-is-not-registered/78482/16

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/napari-napari-ome-zarr-plugin-is-not-registered/78482/18

@joshmoore joshmoore modified the milestone: 0.8.0 May 3, 2023
@joshmoore
Copy link
Member

@jluethi : What's your plan with this PR? I moved it to milestone 0.8.0 to signify that it wasn't slotted for 0.7.0 but I don't want to put promises into your mouth :)

@jluethi
Copy link
Author

jluethi commented May 16, 2023

Hey @joshmoore
For this to really work well, we'd need to aggregate some metadata on the sizes of individual wells on the plate level. I now tested it with a 384 well plate with irregularly shaped wells and it takes ~15 minutes (in this rough implementation) to collect all the metadata. This clearly isn't an acceptable trade-off. Not sure yet when I'll get to pushing the idea of aggregated metadata on the plate level.

It does actually load the images though, instead of just crashing during loading. I see two options of how a cleaned-up version could actually be proposed to be merged: Either via a environment env flag set or with an optional flag set at the plate level.

What's the rough 0.8.0 timeline anyway?

@joshmoore
Copy link
Member

There's no hard timeline for 0.8.0. It's more of on a need basis. But the metadata aggregation could require a spec change, no?

@jluethi
Copy link
Author

jluethi commented May 16, 2023

But the metadata aggregation could require a spec change, no?

Yes, I assume that would require a spec change. So I think a more performant version of this would only come after such a spec change.
The current, slow loading may still be preferable to just failing, so I could see a point for making this an optional path that can be used, if there's interest for it. And maybe there's a bit of optimization room to make it not quite as slow, not sure.

@will-moore
Copy link
Member

Seems like an environment flag is the easiest option (it doesn't need to be added to the plate data itself - good for old plates). Also it won't clash or be confused with future spec changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants