-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for GeneratedBy in any sidecar .json file to collect their provenance #1970
Comments
Not to speak for dcm2niix developers, but I am skeptical that a request to replace two JSON keys encoding I think of prov as the domain of wrappers. Heudiconv could implement the proposal in this PR without any cooperation from dcm2niix. It could also upgrade dcm2niix's fields to BIDS-prov, and skip over this step altogether. |
one option is to add the provenance keys to any sidecar json as an option. |
@satra Could you elaborate how, for example, https://github.com/OpenNeuroDatasets/ds000224/blob/master/task-rest_bold.json would be changed to use the provenance keys? |
What provenance keys exactly @satra and @effigies? In the scope of this issue I would see that file instead of
having
or alike. Ref: https://bids-specification.readthedocs.io/en/stable/glossary.html#generatedby-metadata for overall specification and other fields possible ( |
Provenance BEP is formalizing specification of Similarly to adoption of |
a neat idea, I like it! We would just need to merge BIDS-prov first. But also the question here was triggered by |
we wanted to separate it into a different file simply for simplification that a tool could write out provenance without worrying about the bids json spec for an entity. this issue itself is bringing these two streams together. there is currently a semantic conflict between the for keywords that could be brought in from prov, please see the |
since we already have generatedBy formalized in BIDS, if we decide to rename or change format, we would need to slate it for BIDS 2.0 or whatever other major release to come and automate that change in migration/upgrade script. I initiated so please chime in, or even better implement (e.g. for BIDS 2.0 targetting #1775) on that aspect there. This issue is largely orthogonal from that discussion and just expands adoption of already formalized in BIDS |
@effigies WDYT about the example I have provided above? I have improved it slightly now.
|
@yarikoptic @effigies @satra @Remi-Gau Sorry for arriving late in the game, I missed this thread (thanks for the ping Remi!). As we did for If that's of interest I am happy to work on this with @bclenet to propose a minimal BIDS-Prov-like example including the info you have in the example above @yarikoptic |
Yes, I'd like to see what the inline prov option is. I couldn't guess it from the bep text. |
I agree that it would be great to see BIDS-Prov BEP finalized an a proper PROV records appear in BIDS. Also examples would be great -- in particular the one for above information on But I want to reiterate that regardless of the destiny of BIDS-Prov and its records, I am talking about already existing/formalized |
in prov we separate the agent/software and the activity/process as separate constructs as each has its utility. here is an example of such a generatedby (more fields exist for processes and software). "GeneratedBy":
{
"Id": "urn:filter-00f3a18f",
"Type": "Activity",
"Label": "Filter",
"Used": "bids::sub-001/eeg/myfile_desc-filtered_eeg.set",
"AssociatedWith":
{
"Id": "urn:eeglab-4a586b50",
"Type": "Software",
"Label": "EEGLAB",
"Version": "v2023"
}
}
it could also be simplified, but wanted to give the broader picture of the graph of provenance (what it used, how long did the process take, what software agent(s) were used). |
So Yarik's example would become: "GeneratedBy": {
"Id": "urn:conversion-<nonce>",
"Type": "Activity",
"Label": "Conversion",
"Used": "bids:sourcedata:somefile.dcm",
"AssociatedWith": {
"Id": "urn:dcm2niix-<nonce>",
"Type": "Software",
"Label": "dcm2niix",
"Version": "v1.0.20170411 GCC4.4.7"
}
} Is "Used" required? That may be problematic if filenames are not anonymized. Is "Id" required? If so, guidance on how to generate the nonce would probably be required for users to accept it. And I suppose if Heudiconv does some kind of additional adjustment, it would look like: "GeneratedBy": {
"Id": "urn:curation-<nonce>",
"Type": "Activity",
"Label": "Curation",
"Used": {
"Id": "urn:converted-tmp-file-<nonce>",
"GeneratedBy": {
"Id": "urn:conversion-<nonce>",
"Type": "Activity",
"Label": "Conversion",
"Used": "bids:sourcedata:somefile.dcm",
"AssociatedWith": {
"Id": "urn:dcm2niix-<nonce>",
"Type": "Software",
"Label": "dcm2niix",
"Version": "v1.0.20170411 GCC4.4.7"
}
}
},
"AssociatedWith": {
"Id": "urn:heudiconv-<nonce>",
"Type": "Software",
"Label": "heudiconv",
"Version": "1.3.2"
}
} If we were able to forego Id and Used: "GeneratedBy": {
"Type": "Activity",
"Label": "Curation",
"Used": {
"GeneratedBy": {
"Type": "Activity",
"Label": "Conversion",
"AssociatedWith": {
"Type": "Software",
"Label": "dcm2niix",
"Version": "v1.0.20170411 GCC4.4.7"
}
}
},
"AssociatedWith": {
"Type": "Software",
"Label": "heudiconv",
"Version": "1.3.2"
}
}
A couple responses:
|
|
the basic form of that doc has not changed in a while, and from my perspective it's done unless someone has issues. i think @cmaumet was creating more examples of practical use in a repo and then going to submit it as a PR. @effigies - the id is graph specific and if that is removed (it will simply assume a blank random id). whether anyone can compare that would depend on the various entities/processes/software. your example is completely valid as a stripped down version for json, not jsonld, which would need some kind of id. |
@yarikoptic - with respect to timeline: to me the discussion halted in this issue bids-standard/BEP028_BIDSprov#125. This was after we (@satra, myself) shared with steering that BIDS-Prov was ready for community review. To me the path forward was not clear but I still believe BIDS-Prov is ready for community review :) And I am more than happy to push that now if someone can clarify how. |
Followed up on
Overall, I think there is a chance to meld what @effigies suggested (extracting from larger one) above to allow for .json sidecar to include smth like "GeneratedBy": {
"Type": "Activity",
"Label": "Conversion",
"AssociatedWith": {
"Type": "Software",
"Label": "dcm2niix",
"Version": "v1.0.20170411 GCC4.4.7"
}
} but may be even with further defaults (where could I see schema on e.g. what other My point is that IMHO at the level of BIDS .json file it does not really need to be a "fully fledged" jsonld record as long as it is "compatible" (could be converted to) and not confusingly similar (e.g. re-using attributes for something else of a different type etc). |
Your idea
and reflecting on
for GeneratedBy to be universally adopted at the Dataset level.
To accommodate provenance "per file", e.g. for dcm2niix and alike, we better allow for GeneratedBy at individual file level.
The text was updated successfully, but these errors were encountered: