Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next steps for BIDS-Prov #125

Open
effigies opened this issue Sep 28, 2023 · 13 comments
Open

Next steps for BIDS-Prov #125

effigies opened this issue Sep 28, 2023 · 13 comments

Comments

@effigies
Copy link

I'm opening this issue to figure out what a finalized BIDS-Prov would look like. A couple options:

  1. A PR to the specification (like BEP028 - Provenance bids-specification#487), and BIDS-Prov is 100% maintained within BIDS.
  2. A separate repository for the format specification, like BIDS-Stats Models (https://bids-standard.github.io/stats-models/) or BIDS Execution (https://bids-standard.github.io/execution-spec/), with a relatively confined PR to BIDS proper that makes the relationship clear.

Other arrangements could be imagined, but I think the "in BIDS" or "alongside BIDS" distinction is the main one to settle on, and then work out the details.

Other questions:

  1. How much, if any, validation should be done by the BIDS validator? Will there be a 3rd-party validator to call out to, as with HED?
  2. For existing tooling in this repository, would there be a plan to merge it into another library, or keep it as its own thing? Will it be distributed on PyPI?

@bids-standard/steering
@bids-standard/maintainers

@robertoostenveld
Copy link

In terms of documenting the format specification of BIDS-Prov within the BIDS specification, I can imagine that it would be shortly mentioned in the https://bids-specification.readthedocs.io/en/stable/02-common-principles.html or in the https://bids-specification.readthedocs.io/en/stable/03-modality-agnostic-files.html sections, and then elaborated upon in an appendix.

I can imagine that the BIDS-Prov specific tooling (like the visualizer, but possibly also a non-graphical validator) could continue to live in their own repository under the bids-standard organization.

@sappelhoff
Copy link
Member

sappelhoff commented Oct 4, 2023

If future BIDS extensions like derivatives for ephys heavily depend on BIDS-Prov (i.e., to specify ephys derivatives one MUST use BIDS-Prov), then I think BIDS-Prov should be maintained 100% within BIDS.

Else, if BIDS-Prov becomes something like a "recommended way to document provenance" but not strictly necessary within BIDS, then I would vote for it living in its own repository (like statsmodels, execution, ... ) with only a short mention in the BIDS spec.


  • re: validator --> I think irrespective of how the above decision turns out, maintaining a separate provenance validator that is "just integrated" into the BIDS validator would be simpler (I may be wrong with this assumption), and with the HED validator we already have a precedent

  • re: General prov tooling --> I agree with Robert in Next steps for BIDS-Prov #125 (comment) --> they should live in their own repos under the bids-standard organization

@effigies
Copy link
Author

effigies commented Oct 4, 2023

If future BIDS extensions like derivatives for ephys heavily depend on BIDS-Prov (i.e., to specify ephys derivatives one MUST use BIDS-Prov), then I think BIDS-Prov should be maintained 100% within BIDS.

I think we could have a situation where the BIDS-Prov format definition lives on its own, but some way of defining required entries could still live in BIDS. Similar to how we specify JSON metadata keys/values without defining JSON inside BIDS.

@CPernet
Copy link

CPernet commented Oct 4, 2023

I do not think any derivatives should rely heavily/depend on BIDS-Prov (jsonld) which is why we have introduced the descriptions.tsv, allowing to specify all the steps performed, but if one wants to include a full reproducible workflow then use BIDS-Prov

From a more general perspective, BIDS is about documenting data, and BIDS-Prov documents a process (which I agree gives rise to the data, still it's not the same as a json next to a file). In that respect it does make sense to live next to the main spec.

@cmaumet
Copy link
Collaborator

cmaumet commented May 17, 2024

I am inviting @bclenet to the discussion as he will be working on BIDS-Prov form now.

@yarikoptic
Copy link
Contributor

yarikoptic commented Nov 21, 2024

I mildly disagree with @CPernet as most of the metadata in BIDS very much document a process -- all the MRI acquisition parameters, difference in contrasts/modalities etc. Hence I do think that PROV is "natural" metadata to be added in my opinion.

And that is how BIDS-Prov is different from the examples in the Path 2. -- both BIDS-Stats Models and BIDS Execution are pretty much add-ons which describe processes which would use BIDS data, but do not describe data itself (raw or derived).

I do feel though that BIDS-Prov is sufficiently separate "extension" module which is at large independent from the rest of BIDS, and could have potentially benefited e.g. from its own versioning and "pluggable" tooling (e.g. to complement bids-validator). Hence in some bright future it could have may be became a "BIDS Extension" (see/comment on bids-standard/bids-2-devel#74 where we already have BIDS-Prov listed).

Furthermore, BIDS-Prov is not per se about "reproducible workflow" either, although could potentially lead to it (as well as simple descriptions in descriptions.tsv potentially could ;-) ). The main obstacle IMHO is a "scary" full specification with all the ids etc, and we already had _desc entity IIRC "ready". That is why descriptions.tsv "won". But IMHO, BIDS-PROV is gravely needed as a part of the standard since we do not have yet any alternative mechanism really besides making it completely "outside".

So I would vote to get back to bids-standard/bids-specification#487 and reincarnate the activities there (I see now that I had given some suggestions there 3 years ago). But I also think it should get tuned up (with example/discussions in #1970 in mind) to make it "more user friendly", but this is not the issue to expand on that.

@yarikoptic
Copy link
Contributor

But another point I want to reiterate is that there should be active push/activities, who has juice left? ;-) I have created @bids-standard/bep028 team.

@cmaumet
Copy link
Collaborator

cmaumet commented Nov 21, 2024

We (@bclenet and myself at least ;) ) have juice left!! We'll work on the PR and submit something that everyone (inlcuding @satra ) can review!

@yarikoptic
Copy link
Contributor

I have pushed little styling changes to bep-028 in bids-specification but since PR #487 is closed nothing is updated in the diff there.

Overall

  • that branch better be rebased on master (or get master merged)
  • examples should go into Appendix somewhere
    • in common principles it should have only concise description: BIDS-Prov
  • I would like to discuss potential
    • allowance of presence of .jsonld at any level (not even subject to inheritance principle limitations since nothing would really to be inherited -- they are to be composed into the graph):
    • if not specific to any particular file, could in effect get _prov suffix, thus mimicing dataset level prov.jsonld. so there could be
      • prov.jsonld
      • sub-1/sub-1_prov.jsonld -- wide subject relevant provenance
      • sub-1/ses-2/sub-1_ses-2_prov.jsonld -- specific for the session
      • sub-1/ses-2/func/sub-1_ses-2_prov.jsonld -- for all the functionals
      • sub-1/ses-2/func/sub-1_ses-2_task-rest_bold.jsonld -- specific to those sub-1/ses-2/func/sub-1_ses-2_task-rest_bold.*
        This way tools would have flexibility to express themselves at the level they are operating at (file level, or group of files etc).
  • dissolving "Justification for Separating Provenance from file JSON" section by allowing generatedBy to be specified in the corresponding .json file, potentially with further relaxations such as not demanding id (assume to be unique) and overall have a clear schema which we could encode in our BIDS schema and validate

WDYT?

@yarikoptic
Copy link
Contributor

BTW -- is there PROV-specific jsonschema to validate those example records?

This was referenced Nov 22, 2024
@cmaumet
Copy link
Collaborator

cmaumet commented Nov 22, 2024

I have pushed little styling changes to bep-028 in bids-specification but since PR #487 is closed nothing is updated in the diff there.

Thanks for this. The branch is totally out of date unfortunately :(

Overall

* that branch better be rebased on master (or get master merged)

Sure!

* examples should go into Appendix somewhere

Will do!

  * in common principles it should have only concise description: BIDS-Prov

Sounds good!

* I would like to discuss potential

Thanks for those suggestions! Let's have those as separate issues:

  * allowance of presence of .jsonld at any level (not even subject to inheritance principle limitations since nothing would really to be inherited -- they are to be composed into the graph):

#144

  * if not specific to any particular file, could in effect get `_prov` suffix, thus mimicing dataset level `prov.jsonld`. so there could be
    
    * `prov.jsonld`
    * `sub-1/sub-1_prov.jsonld` -- wide subject relevant provenance
    * `sub-1/ses-2/sub-1_ses-2_prov.jsonld` -- specific for the session
    * `sub-1/ses-2/func/sub-1_ses-2_prov.jsonld` -- for all the functionals
    * `sub-1/ses-2/func/sub-1_ses-2_task-rest_bold.jsonld` -- specific to those `sub-1/ses-2/func/sub-1_ses-2_task-rest_bold.*`
      This way tools would have flexibility to express themselves at the level they are operating at (file level, or group of files etc).

#145

* dissolving "Justification for Separating Provenance from file JSON" section by allowing `generatedBy` to be specified in the corresponding `.json` file, potentially with further relaxations such as not demanding `id` (assume to be unique) and overall have a clear schema which we could encode in our BIDS schema and validate

#146

  * would not need explicit `@context` (potentially related: #[[ENH] Add ContextURI to allow to define the context for the entity values bids-specification#1939](https://github.com/bids-standard/bids-specification/pull/1939))
  * it would be for the bids-prov tooling to extract those, equip with ids and add to the graph

#147

WDYT?

@cmaumet
Copy link
Collaborator

cmaumet commented Nov 22, 2024

BTW -- is there PROV-specific jsonschema to validate those example records?

Not yet. Good point!

Note: Is this listed in what is expected from BEPs already or shall we document somewhere?

@yarikoptic
Copy link
Contributor

BTW -- is there PROV-specific jsonschema to validate those example records?

Not yet. Good point!

Note: Is this listed in what is expected from BEPs already or shall we document somewhere?

no -- I do not think we use jsonschema anywhere, but I thought that if these are "standard PROV" there might be one which we could rely/offload to without brewing our own. For our "sidecar json" files we do have our own schema defined but not sure how much/if relates to jsonschema et al. There indeed it would be required to do "our best" to specify the schema, ideally beyond just stating that GeneratedBy is type object ;)

I have pushed little styling changes to bep-028 in bids-specification but since PR #487 is closed nothing is updated in the diff there.

Thanks for this. The branch is totally out of date unfortunately :(

oh. well, lesson learned - next time I shouldn't be that eager to tune closed PRs ;) whenever you point to the new one, I hope to contribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants