Skip to content

Commit 0b955ab

Browse files
committed
Add comment 5 on RFC-9
1 parent d1dff7c commit 0b955ab

File tree

1 file changed

+83
-0
lines changed

1 file changed

+83
-0
lines changed

rfc/9/comments/5/index.md

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
# RFC-9: Comment 5
2+
3+
(rfcs:rfc9:comment5)=
4+
5+
## Comment authors
6+
7+
This comment was written by the ilastik team:
8+
9+
* Anna Kreshuk, https://orcid.org/0000-0003-1334-6388
10+
* Dominik Kutra, https://orcid.org/0000-0003-4202-3908
11+
* Benedikt Best, https://orcid.org/0000-0001-6965-1117
12+
13+
## Conflicts of interest (optional)
14+
15+
None
16+
17+
## Summary
18+
19+
We welcome this addition to the OME-Zarr ecosystem.
20+
We believe RFC-9 will remove one of the largest barriers to adoption that OME-Zarr currently faces, and open it up to the bulk of everyday image handling needs of most life scientists.
21+
22+
## Significant comments and questions
23+
24+
### Specify possible roots
25+
26+
The specification as of version 0.5 contains several similar terms:
27+
OME-Zarr fileset, Zarr hierarchy, OME-Zarr image, and OME-Zarr dataset.
28+
RFC-9 introduces another variant, "OME-Zarr hierarchy".
29+
This adds yet another undefined term to the specification, and once again leaves the possible forms of OME-Zarr "roots" unspecified.
30+
There are currently only two, Plates and Multiscales, but this is only implicit in the specification.
31+
This has led to the proliferation of non-standard forms, such as nesting several multiscale datasets within a higher-level group that is then treated as a root, or interleaved scale arrays from different multiscales within one root group.
32+
The RFC explains that parsing the contents of a ZIP will be potentially costly or low-performance.
33+
Then we consider it important that readers can expect a more constrained set of internal layouts for single-file OME-Zarrs.
34+
35+
We assume this term was chosen to preempt "collections" according to ongoing discussions, since this will add a new kind of OME-Zarr hierarchy.
36+
However, there is not yet a published RFC for collections (RFC-8 is still being drafted), and RFC-9 is not currently facing significant opposition.
37+
The future collections RFC can therefore simply update any phrasing added by RFC-9.
38+
Arguably, how single-file OME-Zarrs should be treated in the context of collections needs to be discussed, and not implicitly decided by ambiguous phrasing in RFC-9.
39+
If it has been discussed, the phrasing should be more explicit.
40+
41+
We recommend explicitly specifying the possible roots, since RFC-9 assigns meaning to the "root of the OME-Zarr hierarchy".
42+
43+
For example: "The ZIP file MUST contain exactly one multiscale image (including optionally one labels group), or exactly one high-content screening dataset."
44+
45+
At a minimum, we recommend replacing the word "hierarchy" with the equally broad "dataset" or "fileset" to avoid increasing the number of undefined terms in the specification.
46+
47+
### Avoid appending
48+
49+
The RFC details the technical complications that zip files can present when appending or replacing parts inside them.
50+
These considerations are currently not reflected in the proposed modification to the specification.
51+
We suggest adding an explicit recommendation, such as:
52+
"It is RECOMMENDED that OME-Zarr zip files are treated as read-only objects after the initial write operation. Modifying operations SHOULD be implemented by rewriting the entire file."
53+
54+
### Clarify performance expectations
55+
56+
The RFC briefly mentions some external performance evaluations of storing zarr data in zip files.
57+
Referring to external sources for detail is fine in principle, but the references here are links to github.com and pangeo.io.
58+
We consider this prone to link rot.
59+
Even while the links remain viable, the information they point to is subject to change any time.
60+
Other than these references, the section currently provides no information, which means it loses all value if the references become unavailable, or change.
61+
For an RFC to a public standard, in our opinion, at least the most relevant information from the external source should be reflected in the text.
62+
63+
We consider it undue to expect a reader of the RFC to be able to synthesise a conclusion with reasonable effort (or in any case, with less effort than the RFC authors).
64+
If a conclusion can be drawn, the section on performance should be extended with it.
65+
If this is not possible, it might be worth reconsidering what value the references to external sources contribute.
66+
67+
Performance is important, and low-performance OME-Zarr zip files could hinder adoption.
68+
We would welcome if the proposed change to the specification included some statement as to the expectation.
69+
This could be in a predicating phrasing such as:
70+
"Writer implementations SHOULD verify that reading their OME-Zarr zip files is similarly performant as reading from other storage formats."
71+
In this case, it might be necessary to have the specification refer to RFC-9 for details on how to achieve, and how to verify good performance.
72+
73+
Alternatively, one could make this clear by adding an observation like the following (assuming that the existing recommendations cover everything):
74+
"When creating OME-Zarr zip files, the following RECOMMENDATIONS ensure that reading from OME-Zarr zip files is similarly performant as reading from other storage formats:"
75+
76+
## Minor comments and questions
77+
78+
* The proposed new section of the specification uses the term "SHALL", which is so far not used elsewhere in the specification. Since according to IETF RFC 2119, SHALL is synonymous to MUST, and MUST is the term used in the rest of the specification, this should be replaced.
79+
* Duplication of "the" in "The ZIP file MUST contain the the OME-Zarr’s root-level zarr.json."
80+
81+
## Recommendation
82+
83+
We recommend accepting this RFC.

0 commit comments

Comments
 (0)