Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bioformats2raw metadata support #174

Draft
wants to merge 22 commits into
base: master
Choose a base branch
from
Draft

Conversation

joshmoore
Copy link
Member

@joshmoore joshmoore commented Mar 2, 2022

  • Add Implicit spec to loop over metadata-less "collections"
  • Add Leaf & Root specs
  • Support entrypoint-based specs ("ome_zarr.spec")
  • Use entrypoint to adder suport for bioformats2raw.layout ngff#112
  • add tests for SHOULD/MAY portions of the spec

Part of the investigation of metadata in ome/ngff#104. This "implicit" group is the cheapest form of collection imaginable.

Currently, only groups within the given group (and not arrays or explicit files) will be further parsed.

@joshmoore joshmoore mentioned this pull request Mar 2, 2022
2 tasks
@codecov
Copy link

codecov bot commented Mar 2, 2022

Codecov Report

Patch coverage: 77.41% and project coverage change: -0.89 ⚠️

Comparison is base (8964374) 84.79% compared to head (28155d5) 83.90%.

❗ Current head 28155d5 differs from pull request most recent head 836dfd2. Consider uploading reports for the commit 836dfd2 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #174      +/-   ##
==========================================
- Coverage   84.79%   83.90%   -0.89%     
==========================================
  Files          13       14       +1     
  Lines        1473     1591     +118     
==========================================
+ Hits         1249     1335      +86     
- Misses        224      256      +32     
Impacted Files Coverage Δ
ome_zarr/reader.py 83.52% <65.38%> (-3.18%) ⬇️
ome_zarr/bioformats2raw.py 86.11% <86.11%> (ø)

... and 9 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Pointing ome_zarr at a non-root node will now walk-up the
hierarchy until the root node is discovered.
This allows the definition of node specs outside of
this repository. Primary driver is for experimenting
with specs which depend on ome_types, etc. These may
eventually be pulled into the mainline.
@joshmoore joshmoore changed the title Add Implicit spec to loop over metadata-less "collections" New spec parsers for metadata support Mar 3, 2022
@joshmoore
Copy link
Member Author

See https://github.com/ome/ome-zarr-metadata/releases/tag/0.1.0 for an example of an entrypoint. After creating a fake .zgroup under the output of bioformats2raw a.fake /tmp/a.ome.zarr

$ ome_zarr info /tmp/a.ome.zarr/0/test/
/private/tmp/a.ome.zarr/0/test [zgroup]
 - metadata
   - Implicit (1)
   - Leaf (2)
 - data
/private/tmp/a.ome.zarr/0 [zgroup]
 - metadata
   - Multiscales
   - Leaf (2)
 - data
   - (1, 1, 1, 512, 512)
   - (1, 1, 1, 256, 256)
/private/tmp/a.ome.zarr [zgroup]
 - metadata
   - bioformats2raw (3)
   - Root (2)
 - data

Notice:

  1. the Implicit spec scans groups that have no other metadata
  2. Leaf/Root work their way up and back down a hierarchy
  3. bioformats2raw reads OME/METADATA.ome.xml

found.append(Well(self))
self.specs.append(found[-1])

for key, value in entrypoints.get_group_named("ome_zarr.spec").items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do? I see no file named "ome_zarr.spec" and I've not used entrypoints before. I don't get much idea from https://github.com/takluyver/entrypoints as to what it does, only that "This package is in maintenance-only mode. New code should use the importlib.metadata module".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do?

It's essentially a namespace for the particular entrypoint. It's used at
https://github.com/ome/ome-zarr-metadata/blob/main/setup.cfg#L81

@will-moore
Copy link
Member

For an Image in a Plate (12 Wells A-C, 1-4, all wells with labels), without this PR I get:

$ ome_zarr info 251.zarr/A/1/0/
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1/0 [zgroup]
 - metadata
   - Multiscales
   - OMERO
 - data
   - (3, 1024, 1344)
   - (3, 512, 672)
   - (3, 256, 336)
   - (3, 128, 168)
   - (3, 64, 84)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1/0/labels [zgroup] (hidden)
 - metadata
   - Labels
 - data
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1/0/labels/0 [zgroup] (hidden)
 - metadata
   - Label
   - Multiscales
 - data
   - (1, 1024, 1344)
   - (1, 512, 672)
   - (1, 256, 336)
   - (1, 128, 168)
   - (1, 64, 84)
   - (1, 32, 42)

and with this PR I get all the sibling A Wells A2, A3, A4, but not B1-B4 or C1-C4. And I don't get labels for those Wells.

$ ome_zarr info 251.zarr/A/1/0/
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1/0 [zgroup]
 - metadata
   - Multiscales
   - OMERO
   - Leaf
 - data
   - (3, 1024, 1344)
   - (3, 512, 672)
   - (3, 256, 336)
   - (3, 128, 168)
   - (3, 64, 84)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1/0/labels [zgroup] (hidden)
 - metadata
   - Labels
   - Leaf
 - data
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1/0/labels/0 [zgroup] (hidden)
 - metadata
   - Label
   - Multiscales
   - Leaf
 - data
   - (1, 1024, 1344)
   - (1, 512, 672)
   - (1, 256, 336)
   - (1, 128, 168)
   - (1, 64, 84)
   - (1, 32, 42)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1 [zgroup]
 - metadata
   - Well
   - Leaf
 - data
   - (3, 1024, 1344)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A [zgroup]
 - metadata
   - Implicit
   - Leaf
 - data
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/2 [zgroup]
 - metadata
   - Well
   - Leaf
 - data
   - (3, 1024, 1344)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/3 [zgroup]
 - metadata
   - Well
   - Leaf
 - data
   - (3, 1024, 1344)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/4 [zgroup]
 - metadata
   - Well
   - Leaf
 - data
   - (3, 1024, 1344)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr [zgroup]
 - metadata
   - Plate
   - Root
 - data
   - (3, 768, 1344)

Without this PR, napari 251.zarr/A/1/0/ gives me just the 1 image + labels:

Screenshot 2022-03-23 at 10 47 04

With this PR, I get everything as for 'info' above: All the A wells, but only labels for A1:
Screenshot 2022-03-23 at 10 46 07

@joshmoore
Copy link
Member Author

joshmoore commented Apr 13, 2022

and with this PR I get all the sibling A Wells A2, A3, A4, but not B1-B4 or C1-C4. And I don't get labels for those Wells.

@will-moore, you approve of getting the siblings (assuming napari can be fixed)? If so, I'll look why it's only for the one row.

Also, where do you get the labels for this plate? Is this the one you had a script for?

@joshmoore
Copy link
Member Author

@will-moore, I've reverted the upwards parsing. It seemed like a good strategy but there are currently too many edge cases. I don't have labels on plates for testing at the moment, but I think with the current state along with ome/ome-zarr-metadata@08e12f7#diff-0bb17e0ecb4ac83835ee3800a1af71a12f644b0ce782c623ba97f8917916250eR54 all the following should be true:

non-bf2raw bf2raw
HCS unchanged unchanged
non-HCS unchanged now loads all images

The only other change I can think of is if you pass a group that previously did nothing, it will likely try to load the contents.

@joshmoore
Copy link
Member Author

In discussing today with @dgault, @sbesson, @jburel and @melissalinkert, there was a case made for at least adding the flag (Leaf) to make it possible for clients to detect that there is more information that needs loading. Additional methods or parameters should then allow that loading.

@joshmoore
Copy link
Member Author

To improve the codecov results, see https://github.com/zarr-developers/numcodecs/pull/300/files#diff-bc37cd9860eec1facdc18a47798e8a1a2c0ef5dabd999deee049de4a48a5d35fR1 for an option of in-repo testing of entrypoints.

@will-moore
Copy link
Member

@joshmoore To help address the "don't have labels on plates for testing", I created https://gist.github.com/will-moore/0f4cb6b1fdd60a255fcbb956a54a645e which adds labels to a plate (currently assumes images axes are cyx) by segmenting one of the channels.

I don't know if I'm missing something, maybe not using ome_zarr properly, but it feels quite manual to e.g. iterate through Wells on a Plate - manually parsing JSON, joining paths etc and parse_url() for every Well and every Image.

@joshmoore joshmoore force-pushed the implicit branch 2 times, most recently from fd64b1d to 9ad2e5a Compare May 3, 2022 11:01
@joshmoore
Copy link
Member Author

@joshmoore
Copy link
Member Author

Migrated the bf2raw implementation from https://github.com/ome/ome-zarr-metadata :

$ bioformats2raw-0.5.0-SNAPSHOT/bin/bioformats2raw 'my&series=2.fake' test_output
$ ome_zarr info test_output/
/opt/ome-zarr-py/test_output [zgroup]
 - metadata
   - bioformats2raw
 - data
/opt/ome-zarr-py/test_output/0 [zgroup]
 - metadata
   - Multiscales
 - data
   - (1, 1, 1, 512, 512)
   - (1, 1, 1, 256, 256)
/opt/ome-zarr-py/test_output/1 [zgroup]
 - metadata
   - Multiscales
 - data
   - (1, 1, 1, 512, 512)
   - (1, 1, 1, 256, 256)

Initially prepared as a plugin in https://github.com/ome/ome-zarr-metadata
this is being moved to the main repository since it is a part of the
main spec.
_logger.info("found %s", series)
subnode = node.add(series)
if subnode:
subnode.metadata["ome-xml:index"] = idx
Copy link
Member Author

@joshmoore joshmoore Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably the biggest lingering question I have since this basically becomes public API e.g. used in napari-ome-zarr. Thoughts welcome. cc: @will-moore @sbesson et al.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder where this public API is documented?
This new module should be added to https://github.com/ome/ome-zarr-py/blob/master/docs/source/api.rst but I think these changes in parsing also need to be described elsewhere, maybe at https://ome-zarr.readthedocs.io/en/stable/python.html#reading-ome-ngff-images

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big 👍 but first do you think what's there is usable, understandable, extensible, etc.?

subnode = node.add(series)
if subnode:
subnode.metadata["ome-xml:index"] = idx
subnode.metadata["ome-xml:image"] = image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add the following here, then you don't need ome/napari-ome-zarr#47

subnode.metadata["name"] = image.name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then plugins would/could overwrite the main metadata, right? Which means it becomes a question of the order that the plugins run in. Would:

subnode.plugins["bioformats2raw"].metadata["name"]

be better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As probably transpired from #140, I am also unclear on the contract and expectation for the node.metadata field. My impression was that this API has been largely driven by the napari use-case.Reading this, it looks like another goal is to introduce some form of metadata consolidation so that a consumer does not have to go up and down the hierarchy to assemble pieces of metadata, is that correct?

If so, I agree what is missing is some form of namespace to differentiate the metadata injected at different levels in the hierarchy. Taking advantage of the fact this is a Python dictonary, another proposal would be to group all the metadata under a top-level key e.g. subnode.metadata["bioformats2raw"] = {"index": idx, "image": image", "name", image.name}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.Reading this, it looks like another goal is to introduce some form of metadata consolidation so that a consumer does not have to go up and down the hierarchy to assemble pieces of metadata, is that correct?

Maybe. But now that we have the json-schema, I could also see just attaching objects at the right node level to prevent re-reading the file. That's essentially what would happen if the ome-types OME object becomes part of the bioformats2raw plugin's public API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind I might expect something like Josh's original idea or subnode.metadata["ome.xml"] = {"index": idx, "image": image", "name", image.name} since this is metadata coming from the ome.xml?

The reason I like subnode.metadata["name"] = image.name is that any consumer doesn't have to know about bioformats2raw or xml etc to have the correct image name.
But if it's just napari-ome-zarr and we already have ome/napari-ome-zarr#47 then 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that any consumer doesn't have to know about bioformats2raw or xml etc to have the correct image name.

This possibly gets back to the question of whether or not this bf2raw parsing is core or not. If it is, then you're probably right that we should just encode the rules for what "wins" directly. If this is a plugin, though, then how would we handle misbehavior in the plugins?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a plugin here? If it's a plugin, then it's not installed by default and you get to choose if you want to add it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This possibly gets back to the question of whether or not this bf2raw parsing is core or not

Trying to answer this, this is where I would draw the line:

  • if a specification is published in https://ngff.openmicroscopy.org/, I consider it as core (even if transitional) and my expectation is that this library should support it
  • a contrario, any specification not defined in the upstream specification should be seen as an extension and should be handled by some plugin/third-party mechanism. This might drive the discussions on the extensibility of this library which is a good thing.

@joshmoore
Copy link
Member Author

Should we also discuss the name of the module itself?

@will-moore
Copy link
Member

We added an omero block of channel & rendering metadata to the multiscale .zattrs (because it came from omero) but we actually want other tools to read and write this metadata, which may be discouraged by the naming.
In the same way, bioformats2raw.layout is a spec that just happens to be produced originally by bioformats2raw, but it's really a spec that ALL tools should read/write.
I don't know if it's too late to think about a different name there, or if the name has already stuck?

@joshmoore
Copy link
Member Author

Other than the string bioformats2raw.layout we're pretty free to change things here. (I'd say we definitely don't want to reproduce what we did with omero and we actually need to think about how to make that "transitional" as well)

@will-moore
Copy link
Member

Ah - yes, too late to change the "bioformats2raw.layout" key because data generated with this already exists.

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/intermission-ome-ngff-0-4-1-bioformats2raw-0-5-0-et-al/72214/1

@joshmoore joshmoore changed the title New spec parsers for metadata support bioformats2raw metadata support Oct 7, 2022
joshmoore added a commit to joshmoore/ome-zarr-py that referenced this pull request May 3, 2023
@joshmoore joshmoore mentioned this pull request May 3, 2023
@joshmoore joshmoore added this to the 0.8.0 milestone May 3, 2023
@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/saving-volumetric-data-with-voxel-size-colormap-annotations/85537/24

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/constructing-zarr-for-napari-ome-zarr/65398/4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants