-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use-case(s): BIDS-inspired/like standards #62
Comments
ping @tgbugs for SPARC info on bids-like |
SPARC Data Structure (SDS)The SPARC platform has a preprint online that describes a BIDS inspired data structure: https://www.biorxiv.org/content/10.1101/2021.02.10.430563v2. Rough details are also on their wiki. edit by @yarikoptic : thanks -- added |
after review with @dyf we agreed that it was a little too distant from BIDS, at most indeed just "inspired" ;), so we have it as
|
that's good. although, i'm not quite sure that use-case is too distant. practically speaking, we run into that issue with ukbiobank and any of the large datasets where we need to process a few subjects with bids-apps. we have had to create sub-bids to satisfy bids and hence tools like fmriprep. one could say that bids should not care about that. however, a self-contained single subject/session subset would be a relevant use-case in bids i believe. ps. the challenge has been that bids has the different levels of consolidation of information: grouped (participants, sessions, etc) inheritance (via jsons), and single (the files). this necessitates a connected structure that relies on those pieces of information. the advantage of bids is efficiency (for the grouped files; although longitudinal data is inefficient) and deduplication (for the inheritance), and readability (single path+filename). these different use cases may be good to consider in bids 2. |
Will also add Templateflow: https://www.templateflow.org/usage/archive/#acceptable-data-types
|
Our NeuroBlueprint specification is definitely in the BIDS-inspired category. Together with @JoeZiminski we are working on putting together a list of divergences from BIDS and the rationale behind them. |
@satra see
May be we should add/promote upvoting via 👍 on the issues, so "go wild" ;) |
I can share some context on Psych-DS if it's helpful, but I'm unsure how to comment on DANDI. Our data standard is pretty explicitly modeled on BIDS and our validator tool is a essentially a very pared-down fork of BIDS' recent deno implementation. Definitely our reasoning for diverging instead of using BIDS directly or creating some sort of module within it has to do with complexity. A big part of the ethos of our project is simplicity, since we're trying to bring researchers with a lack of experience with explicit data standards into the fold of producing FAIR datasets. We designed our standard to be the minimal set of conventions for producing consistently-structured, machine-readable datasets with linked metadata, and we avoided the impulse to include additional advanced/options conventions or conventions governing the internal content of datafiles, because we figured that even the presence of this additional material in our documentation could scare off our target audience. On a technical level, I noticed that the BIDS deno-based validator only applied rules to files that actually appeared within datasets, with no functionality to produce errors/issues in cases where certain elements were absent, and since this notion of presence/absence was important in our schema, that was one initial additional impetus for diverging with our own tool. Additionally, with Psych-DS 1.0 at least, we only wanted to validate simple tabular CSV data, and a lot of the structure of the BIDS validator had to do with applying different rules and conventions depending on datatype. We followed BIDS' lead when it came to our usage of linkML for creating a structured model of our schema, and I definitely used BIDS' examples explicitly when developing our stack of tools, which was extremely helpful. These are just a few random thoughts and pieces of context. Feel free to ask me anything specific and I can answer in detail. Also, I should mention that @mekline is on maternity leave until sometime this June/July and @ianchandlercampbell is our interim director for the project. |
That is very valuable insights @bleonar5 , thank for sharing!
Could you elaborate here (or even as a dedicated issue against bids-validator, which is if not pertinent - could be closed) more on this since I am not fully grasping, since as to me bids-validator must error whenever any REQUIRED component (metadata or file) is missing.
also sounds very intriguing and like something what could be generally applicable to BIDS. Could you elaborate more?
do you have a link to linkML models handy? |
@bleonar5 some of the links on the PsychDS README are broken, could you share a
I think that's also part of the ethos of BIDS, perhaps we could look into simplifying BIDS for 2.* as well. What are some of the key complexity concerns which made BIDS 1.* less attractive? |
@yarikoptic My first response was a bit cursory and based on my memory of our initial rationales for diverging, I'll try to dig in a bit deeper here. I think your summary of our rationale was correct: we wanted to provide similar structures and standards to those that BIDS provides, for researchers that deal with behavioral data rather than complex physiological data. @satra informed us about the BIDS' team's development of a structured schema model in linkML, and this satisfied one of our core desiderata for the project, which was to have an externalized, structured schema that we could reference across validator tools in multiple frameworks (node, R, python). So, we used the combination of pruned-down versions of BIDS' in-development Deno validator and linkML schema as (very helpful) jumping off points for our own development, with proper citations and acknowledgements, of course. @mekline has had a much longer history with the development of Psych-DS as an independent entity, and could possibly speak to our rationales for divergence much better, and she may be able to share more detailed thoughts on her return. One crucial element that I'm remembering now is a technical difference between the structure of most physio data and the behavioral data that we're interested in. Physiological data is so rich and often tied to multiple measurements over time, that it seems to be a standard assumption (and I think this is reflected in the BIDS spec) that datafiles will be organized around individual subjects/sessions. In a lot of behavioral datasets, this is not the case, as the whole set of responses for a given subject may be representable in a single row, and one datafile may represent the data gathered from an entire experiment. BIDS is complex and my knowledge of/experience with it only extends to the research I did prior to beginning development of the Psych-DS validator, but it seemed to us that following some kind of subject-oriented system of data organization would be necessary for compliance with BIDS, and this was a major rationale for divergence. (@TheChymera, I think this paragraph is the most relevant answer to your second question) As for the matter of presence/absence of files/directories that I mentioned previously, I think this is actually just an issue with the deno-based validator rather than the older, public-facing validator. And the deno validator is still in development, so it may just be that I mistook a bug/unfinished component for an actual aspect of the BIDS spec. Basically, if you provide an otherwise-valid BIDS dataset that is missing an element (such as the dataset_description.json file) to the web validator, it produces an error as expected (DATASET_DESCRIPTION_JSON_MISSING). If you do the same with the deno validator, it outputs a VALID_DATASET result and does not report the absence of the required file. This is because the validator crawls the filetree of the dataset, finds whatever files/directories are present, and runs a series of checks on them based on the rules in the linkML schema. But if a core file is missing, the crawler never encounters the file in question, so the relevant rules that would assert the necessity of the file's presence are never applied. I could certainly create an issue for this if it's helpful, but I was unsure if it's appropriate given the fact that the validator has not been publicly released, and this feature may be scheduled for later in the development plan. Here is a link to our linkML schema model as it currently stands (in development). At the moment it is not really intended to be used with the standard linkML validator library, and is more being used as just a structured, machine-readable implementation of our schema. |
@TheChymera Here is a minimal Psych-DS file structure, from the Psych-DS spec document, whose contents we are in the process of integrating into a more holistic readthedocs site for the project/schema/validator Thank you for the heads up about the dead links, I will do a once-over on our read me and take care of those ASAP |
Thank you @bleonar5 !!
similar aspect relevant to phenotype data, per our discussion with @surchs. If I would recall correctly we arrived (or I forced ? ;) ) to the conclusion that there could be a "nominal data representation": per sub/ses representation (even if a single row) + derived composition somewhere else -- after all the notion of the "derivative" dataset is steadily becoming less of an ugly duck in BIDS world. But also it might relate to the discussion of
Sorry if this feels "too jumpy", but I think there is a common pattern emerging here across different aspects ;)
please do, or let me know if I should do -- since it does sound like a true bug since validator must error out if any of the |
Hey @yarikoptic Sorry for the delay in replying, we (me and @JoeZiminski) were aiming to write a full post on this, intended for our website but things have been busier than expected. Please find below a summary of the logic behind NeuroBlueprint, where it diverges with BIDS, in what ways BIDS is not fulfilling our requirements, and how this could remedied with with BIDS 2.0. For context, we recently wrote two blog posts motivating NeuroBluerint and the related data-management tool datashuttle. NeuroBlueprint motivationThe main motivation for NeuroBlueprint is to provide a version of folder standardisation with a very low barrier for entry, mostly focused on the data acquisition stage of a project. We found BIDS, while necessarily detailed with the aim of full standardisation and reproducibility, can be too detailed for researchers very busy in the early stages of a project. For our purposes, we at this stage just want to know where researchers' data are in a predictable way, for ingestion into analysis pipelines. A more minor consideration was that BIDS is somewhat biased towards techniques used in human subjects (MRI, EEG, MEG), while NeuroBlueprint is more geared towards systems neuroscience (animal subjects), similar to NWB. While BIDS is slowly moving towards accommodating such data (most notably with BEP032 for animal ephys), the "human legacy" still informs much of its design and terminology. The founding idea of NeuroBlueprint is that some standardisation is preferable to no standardisation. Our initial goal was to present systems neuroscientists with a small subset of BIDS requirements (those that are easy to adhere to at the data acquisition stage), which would make it easier for researchers to transition to "full BIDS" (or NWB) later, at the stage of paper publication and data sharing. However, while iterating on the NeuroBlueprint spec we realised that we had to break with BIDS in some areas, even within the subset we mandate. Divergences from BIDSUnlike BIDS, we currently make no requirements on metadata, file names, file format, etc. Essentially the only things we do require at present are a BIDS-style folder hierarchy and naming, specifically:
Datatypes and modalities Moreover, the existing BIDS datatypes do not map well onto the methods typically used in systems neuroscience labs. For example, BIDS reserves As such, we have taken liberties with datatype names, and at present we only mandate 4 datatypes ( Although this seems like a major divergence, we thought it would be relatively easy to reconcile once a project is complete, with appropriate converters. It would essentially involve moving/renaming datatypes and adding appropriate modality suffixes as needed. Wish list for BIDS 2.0As is apparent from above, what we'd love to see in BIDS 2.0 is a re-thinking of the datatype/modality concept, or at least some room for flexibility in defining/naming datatypes. The absolute dream would be to have BIDS 2.0 consist of a set of specific and atomic rules, the same way that linters like ConclusionWe absolutely love what BIDS has done for the neuroimaging community and we are on board with extending its benefits to other research communities. NeuroBlueprint is still young and many of its points are still amenable to change, as long as we stick to our main design consideration, which is to keep the spec minimal and easy to adopt. Let's keep the conversation going! |
As a placeholder for deeper discussion here are some links that provide a partial overview of SDS. I have a poster on this an Neuroinformatics so can update with that here as well when it is ready. How SDS models ontological participants that was inspired by the discussion in bids-standard/bids-specification#779. Changelog for the latest version of SDS and the actual release. The most important change in this context would be our move to accommodate data management processes where most or all file metadata is stored in a manifest file of some kind, that might look like mapping file names, object ids, checksums, s3 object paths, etc. to metadata records in a separate system in addition to the traditional folder naming conventions. I think this touches on The manifest changes also relate to how BIDS-like standards can enforce data modality file type requirements (e.g. that mri should have nifti files, ephys should have nwb, microscopy ome-tiff, etc. while allowing png files elsewhere in the dataset) without necessarily having to have folders for each modality which can result in a nxm-fold increase in directories for n subjects and m modalities. With retard to interoperability between standards, there is discussion in the changelog about one way to make file system conventions (data set standards) nestable using a file called |
Quite often projects do not adopt BIDS due to complexity or not perfect fit, and then establish new a file layout and/or metadata convention/standard while saying they are "BIDS-like". The likeness varies greatly. Quite often it is simply the aspect of having folders and file names with some metadata in them. Those are not worth mentioning here. But there is a good number of BIDS-like standards (in my words - formalized descriptions adopted by a considerable number of people) which are worth reviewing and analyzing for what could minimize divergence between them and current BIDS through possibly introducing missing but reasonable and desired features into BIDS 2.0.
This issue would be used to collect pointers and possibly summarize rationale and major features behind them.
DANDI layout
Established by me and @satra for https://dandiarchive.org, primarily due to complete lack of usable standard/layout at that earlier point in time .
dandi organize
command on a set of .nwb files.dandiset.yaml
schema is defined within pydantic model in https://github.com/dandi/dandi-schema/blob/master/dandischema/models.py#L1405Notable divergences:
dataset_description.json
- metadata is in "in-house"dandiset.yaml
ses-*/
level subfolder but there isses-
entity in the target filenames (convergence possible through Make it possible to specify folders layout to be other than sub-{label}/[ses-{label}/] #54).nwb
(convergence through https://bids.neuroimaging.io/bep032 for animal ephys data, discussed/not (yet) accepted in BIDS 1.0 formicr/
: Allow for .nwb standard/file format to be used for "micr" bids-specification#1632)-
and+
. (TODO: ref BIDS PR)+
(no PR yet I think)+
PsychDS
https://psych-ds.github.io/ https://github.com/psych-ds/psych-DS (attn @mekline and @bleonar5 - would appreciate details/feedback alike for DANDI here or in a dedicated issue/doc)
NeuroBlueprint
https://neuroblueprint.neuroinformatics.dev/specification.html .
Request for summarization of rationale/divergences: neuroinformatics-unit/NeuroBlueprint#51
SPARC Data Structure (SDS)
The SPARC platform has a preprint online that describes a BIDS inspired data structure: https://www.biorxiv.org/content/10.1101/2021.02.10.430563v2. Rough details are also on their wiki.
TemplateFlow
https://www.templateflow.org/usage/archive/#acceptable-data-types
Brain-Development.org Atlas
https://brain-development.org/brain-atlases/atlases-from-the-dhcp-project/cortical-surface-template/ describes itself as "using BIDS conventions", and proceeds to define custom entities and metadata.
NiPoppy
Study-level description which includes bids dataset and uses some conventions (like
derivatives/
subfolder with clearer defined naming convention).CAPS
ClinicA Processed Structure:
TODOs:
The text was updated successfully, but these errors were encountered: