Skip to content

Conversation

@normanrz
Copy link
Contributor

This is the work-in-progress draft for RFC-8.

cc @jluethi @lorenzocerrone @tischi @perlman @matthewh-ebi

@github-actions
Copy link
Contributor

github-actions bot commented Sep 29, 2025

Automated Review URLs

@normanrz normanrz mentioned this pull request Sep 29, 2025
rfc/8/index.md Outdated
#### `Collection` keys

* `"type"` (required). Value must be `"collection"`.
* `"nodes"` (required). Value must be an array of `CollectionNode` or `Collection` objects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since every node has a unique name, why is this an array and not an object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that could also work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if representing an order may be desired, though. For example, https://ngff.openmicroscopy.org/latest/index.html#bf2raw states "Parsers like Bio-Formats define a strict, stable ordering of the images in a single container ...".
If it were an object the ordering would likely get lost in some JSON implementations. It could be represented through sortable node names, but that also seems less convenient.

Copy link
Contributor

@d-v-b d-v-b Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

order might also be useful for collections of layers in the context of an image visualization tool. Although you can always add an "order" field to the elements that's an integer (sort of the reverse of adding a "name" field that must be unique in the container).

rfc/8/index.md Outdated

### Metadata

This RFC defines two main objects for OME-Zarr: `Collection`, `CollectionNode`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A CollectionNode can be a Collection, so it's a bit confusing to say that these are two objects unless you explain that "object" here means something like "interface" or "protocol"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the best term here? Is it a class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, there are currently 3 entities that need to be defined:

  • collection
  • multiscales
  • root

collection and multiscales can be discriminated based on their type field, and collection has attributes that multiscales does not, so regular inheritance from a base class doesn't express their relationship very well.

Maybe defining these as protocols would work? e.g., there's a core Node protocol, which the fields {type, name, attributes}, and objects that implement Node can also implement Collection OR Multiscales (but not both, because of the requirement on the type key). Finally, there's a Root protocol which can only be implemented by a Collection

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably bioformats2raw.layout and plate collections will still be around (not removed with this proposal). So a Node could be Collection or Multiscales or bioformats2raw or plate?

Copy link
Contributor

@d-v-b d-v-b Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I was wrong, regular inheritance isn't problematic for Collection and Multiscales -- there's a base Node, Collection and Multiscales (and anything else) inherit from Node (totally fine for them to add new attributes as children).

As for the requirement is that there be only 1 root node, I don't think that can be expressed in a type system easily as long as the root is structurally compatible with a Collection, but that can be added as a regular requirement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the requirement is just that the root node have version (weaker than requiring that only the root node have version), then this is a bit simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably bioformats2raw.layout and plate collections will still be around (not removed with this proposal). So a Node could be Collection or Multiscales or bioformats2raw or plate?

The idea is to remove bioformats2raw.layout and plate as separate entities with this proposal and express the functionality through attributes in the collection nodes. We need to work more on these.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this work similar to how I proposed it for the coordinate transforms? In essence, the paths specified in the plate metadata could be allowed to contain a Collection, which would contan the reference to the path.

@d-v-b
Copy link
Contributor

d-v-b commented Oct 2, 2025

this is looking really cool!

@dstansby
Copy link
Contributor

Looks nice! As a quick initial comment, it would be super helpful to have a minmal example that demonstrates the new metadata structure being proposed - the webknossos examples are nice, but I'm struggling to distinguish what's required and optional in those files because there's lots of extra (I think?) attributes.

}, {
"name": "..",
"type": "collection",
"path": "./nested_collection.json"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The collection should be a directory that contains a zarr.json, right?
e.g. "path": "./nested_collection.zarr"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, now I see that this standalone json file is proposed as part of this RFC. But that isn't covered until much later below under Examples Where is this collection metadata stored?. Maybe that should be moved up above this point?

If an implementation is using e.g. zarr-python or another zarr library to retrieve zarr metadata, then it may be kinda painful to also support fetching of vanilla file.json files using a different mechanism? Don't know about other libs.

@will-moore will-moore mentioned this pull request Oct 30, 2025
4 tasks
@jo-mueller jo-mueller mentioned this pull request Oct 30, 2025
@will-moore
Copy link
Member

I started a basic implementation of Collections spec for the validator at ome/ome-ngff-validator#62.
This should allow you to browse example Collections. Also there's a couple of linked test collections there to try out.

| - | - | - | - |
| `"type"` | string | yes | Value must be `"multiscale"`. |
| `"name"` | string | yes | Value must be a non-empty string. It should be a string that matches `[a-zA-Z0-9-_.]+`. Must be unique within one collections JSON file. |
| `"path"` | string | yes | Value must be a string containing a path. [See paths section](#paths) |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @normanrz,

Is the path for a multiscale node really required?

In the example custom-nodes.json:

{
  "type": "multiscale",
  "name": "color",
  "attributes": {
    "webknossos:category": "color",
    "webknossos:bounding_box": {
      "topleft": { "x": 128, "y": 128, "z": 128 },
      "size": { "x": 5445, "y": 8380, "z": 3285 }
    },
    "webknossos:data_type": "uint8"
  },
  "path": "/absolute/path/to/l4dense_motta_et_al_demo/color"
}

Here, the attributes do not contain any OME-NGFF metadata, so (if I understand correctly) the axes, datasets, etc. are expected to be found in
/absolute/path/to/l4dense_motta_et_al_demo/color/zarr.json.

In contrast, in the other example inline-multiscale.json, the OME-NGFF metadata is provided at the top level:

{
  "type": "multiscale",
  "name": "segmentation",
  "attributes": {
    "webknossos:category": "segmentation",
    "webknossos:bounding_box": {
      "topleft": { "x": 0, "y": 0, "z": 0 },
      "size": { "x": 5632, "y": 8704, "z": 3584 }
    },
    "webknossos:data_type": "uint32",
    "webknossos:values": { "max": 100000 }
  },
  "multiscales": [
    {
      "axes": [
        { "name": "c", "type": "channel" },
        { "name": "x", "type": "space", "unit": "nanometer" },
        { "name": "y", "type": "space", "unit": "nanometer" },
        { "name": "z", "type": "space", "unit": "nanometer" }
      ],
      ...
    }
  ]
}

In this second case, there is no path for the node.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(besides the paths in the datasets)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still an open design question, whether we should allow inlining multiscales. If yes, the path will not be required anymore. My example was just an experiment, not normative.

@jo-mueller jo-mueller added the rfc Status: request for comments label Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rfc Status: request for comments

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants