Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to specify folders layout to be other than sub-{label}/[ses-{label}/] #54

Open
yarikoptic opened this issue May 31, 2023 · 8 comments
Labels
folder-structure Proposals to reorganize files in the specification. impact: high Estimated high impact change modularity Issues affecting modularity and composition of BIDS datasets

Comments

@yarikoptic
Copy link
Contributor

yarikoptic commented May 31, 2023

Origin: Originally summarized/presented in bids-standard/bids-specification#751 (comment) (not duplicating here for now) while discussing a possible "stimuli BEP" and where it boiled down to having some stim-{label}/ folders structure either at top level or under stimuli/, which is currently no defining any structure to use there.
Current state: many usecases collected (see e.g. below), design being formalized in

Other relevant issues in this bids-2-devel or elsewhere I found which would be partially or fully addressed with such enhancement

  • Multi-site/center studies #11 : add /site-<site_label> level in favor of encoding it within /ses-{label}

  • Issues with dandi organize dandi/dandi-cli#1302 - in DANDI we support a lightweight "BIDS-inspired" layout (while BEP032 is still being worked on) which has no /ses-{label} subfolder since makes little sense since lots of sessions and 1 file per session with possibly already a long file name due to long sub and ses labels.

  • https://bids.neuroimaging.io/bep038 - Atlases BEP... IMHO could have atlas-<label>/ top level structure for the entity atlas

    • gives a use case inspiring the description to allow for such "leading prefix atlases/" description as well. So we might have smth like {'.': ["subject", "[session]", "datatype"], 'atlases': ["atlas"]} to describe that on top level we separate at subject level and under atlases/ -- at "atlas", but for a dataset which is purely an "atlas" dataset, it could be {'.': ["atlas"]}
  • https://bids.neuroimaging.io/bep035 - MEGA (Modular extensions for individual participant data mega-analyses) BEP. Proposes study- entity at the top level and studies.tsv to summarize.

  • would provide a solution for Allow composition of a BIDS dataset (dataset level) from smaller (subj or subj/ses) level #59

    example (prototype since we have not boiled down syntax)

    top level dataset_description.json could have "default" one

    "DatasetLayout": { "." : [{ "entity": "subject", "folder": true }, { "entity": "session", "folder": true }] }

    whenever nested BIDS dataset at sub-XXX/ses-YYY/ level have

    "DatasetLayout": { "." : [{ "entity": "subject", "folder": false }, { "entity": "session", "folder": false }] }

    thus signaling that sub-XXX_ses-YYY_ should still be within the target filename as a prefix but no leading directories should be there.

  • in the scope of stimuli BEP (XXX, google doc), to accommodate large stimuli databases, such as https://cocodataset.org/ with 330K images, it would require some groupping. But we would need to figure out how to group in general -- would require more entities than just stim-

  • some heavy datasets might want even more entities to be used. E.g. in https://dandiarchive.org/dandiset/000026, there are thousands files for about 50 different _sample-s under sub-I38/ses-SPIM/micr so it would have been logical to add sample-<label>/ level and samples.tsv to describe them, make it smth like

    "DatasetLayout": { "." : [{ "entity": "subject"}, { "entity": "session" }, { "entity": "sample" }] }
@yarikoptic
Copy link
Contributor Author

yarikoptic commented Jan 26, 2024

In like of discussion on Atlases BEP I think we should provide some overall formalization behind BIDS files/structure, which could sound like

overall organizational principle which could describe "how BIDS file hierararchy is built", we might be able at some point to state something like

  • a folder entities/ (a plural version of the entity) to make that entity the leading entity to distinguish groups of files
  • subsequent subfolders ent-<label>/ (ent as an abbreviated version of entity) could be provisioned to further group data for the same ent-<label>/.
  • file entities.tsv with a corresponding entities.json to describe columns in the .tsv is recommended to be provided
    • note: we might or not want to harmonize participants.tsv -> subjects.tsv for unification and despite suboptimal connotation.
  • another entity, e.g. entity2, could be chosen for the next level of groupping under the first level entity. In such case the leaf filenames would acquire ent-<label>_ent2-<label>_ prefixes.

Such principle already lays down well for our sub-/ses- hierarchy and having participants.tsv for sub- and sessions.tsv for _ses-, for _desc we have descriptions.tsv, so overall "backward compatible" (but see #55 which breaks it on two aspects: no ent- prefix and no _ent- portion in the filename prefix).

@arnodelorme
Copy link

arnodelorme commented Apr 17, 2024

Is there an example of a use case where this would be relevant? @yarikoptic BIDS 2.0 is meant to be more user-friendly. Complexifying an already complex scheme will not make BIDS 2.0 more user-friendly.

Also, about the logic with prefixes, entities, simmetries between all entities, etc. While it makes a lot of sense to computer scientists, it is completely lost on the common mortal.

@yarikoptic
Copy link
Contributor Author

Examples are linked in the original description. Added one more for #59 .
Any lack of "symmetry" actually hurts mortals (e.g. classical "why is it sub-, but then participants.tsv?" although it is a separate issue #14 but most representative of consistency/symmetry here).

yarikoptic added a commit to yarikoptic/bids-specification that referenced this issue Apr 27, 2024
Aims to provide a solution to

- bids-standard/bids-2-devel#54

### Name rationale:

Originally I thought to name it BIDSLayout but that one was/is used as
a class in pybids. On one hand it is great because corresponds in "principles".
But I thought to avoid confusion at least ATM so to make it easier to find
issues/code where such a term is used/mentioned.  So for now decided to go
with BIDSEntitiesLayout but it would be easy to change to anything we want.
yarikoptic added a commit to bids-standard/bids-specification that referenced this issue Jun 22, 2024
Aims to provide a solution to

- bids-standard/bids-2-devel#54

### Name rationale:

Originally I thought to name it BIDSLayout but that one was/is used as
a class in pybids. On one hand it is great because corresponds in "principles".
But I thought to avoid confusion at least ATM so to make it easier to find
issues/code where such a term is used/mentioned.  So for now decided to go
with BIDSEntitiesLayout but it would be easy to change to anything we want.
@mateuszpawlik
Copy link

I second @arnodelorme here. I'm dealing with BIDS datasets for several years now while building a repository. I can't understand the motivations and implementations of this issue. Do you expect people and tools to understand a flexible layout like the one explained here? In my opinion, this adds an unnecessary level of indirection. Suddenly, all the tools we're using will have to interpret some directory layout specification. And once flexibility is allowed, you can expect everyone to use it, possibly leading to a different layout for each dataset.

It's complicated enough to deal with optional sessions while implementing a tool. We decided internally, and users didn't object, to make sessions mandatory just not to deal with handling that.

The strength of BIDS is its specificity, fixed directory structure, and file naming convention. I wouldn't go away from that.

@yarikoptic
Copy link
Contributor Author

@arnodelorme and @mateuszpawlik thank you very much for chiming in! I would be happy to explain more on my motivation beyond use cases I keep populating in the original description. But may be we could discuss them "interactively"? Are you planing to attend upcoming INCF in Austin, TX or SfN in Chicago, IL? If not -- we could zoom.

Quick summary answer to @mateuszpawlik : one of the original motivations is that BIDS already covers more than just 'neuroimaging' data (e.g. microscopy) and even more modalities would become supported as time goes. Not all of them have subject as the level of differentiation most appropriate at the highest level. Could be as large as a "study" or as little as a "slice" (see OP for references). Talking about people, when you come to a new BIDS dataset and see that on first level you have sample-1/ , sample-2/ and so on, you would immediately understand (without even looking anywhere) that it is about different samples (it is a standard BIDS entity). And I do acknowledge that for tools it would indeed require some development to support the specification, instead of hardcoding fixed assumption of the hierarchy. But I also hope that common libraries like schemabidstools and pybids could assist in making such transitions easier, while empowering those tools to support a much wider range of cases.

@yarikoptic yarikoptic added the modularity Issues affecting modularity and composition of BIDS datasets label Sep 20, 2024
@tgbugs
Copy link

tgbugs commented Sep 20, 2024

One example use case is that if bids 2 were to include the ability to specify layout and metadata location/binding rules at a meta level, then it would be possible to express those rules for other standards (e.g. SDS). This would allow formats that diverged from bids 1 due to its limitations to reconverge on bids 2.

@yarikoptic
Copy link
Contributor Author

Thank you @tgbugs for the feedback/support. Please 👍 this issue ;) Do you think you could compile a list of possible steps to converge SDS to BIDS, e.g. like I did for DANDI ?

@tgbugs
Copy link

tgbugs commented Sep 20, 2024

I will take shot at a list of possible steps though will likely only get to it after Neuroinformatics and SfN.

yarikoptic added a commit to yarikoptic/bids-specification that referenced this issue Nov 11, 2024
Current names duplicate their "domain" (subject_ or session_), inconsistent in
plurality (_id although for all ids), and not clear really what
'phenotype' corresponds to without reading the description.

With proposed change there is no duplication of the domain, consistency
in plurality (albeit 'id' is an abbreviation so 'ids' is a non-word), and
IMHO clearer meaning in `ids_phenotype`.

This would also allow for generalization across other entities in a perspective
bids-standard/bids-2-devel#54 - where then any entity
with folders for its level could have `dirs`. Also it could come handy to
determine `ids` for some other entities in tests.

Ref: bids-standard/bids-validator#94 (comment)

TODOs
- [ ] introduce corresponding changes to bids-validator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
folder-structure Proposals to reorganize files in the specification. impact: high Estimated high impact change modularity Issues affecting modularity and composition of BIDS datasets
Projects
Status: In Progress
Development

No branches or pull requests

5 participants