Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Preface
Hey, here are our proposed adjustments to the datamodel and documentation for enabling ISA to thoroughly describe data objects.
For reference, here is the discussion about this topic: ISA-tools/isa-api#484
General Goals
Our goal here is to improve the description of
data
using the isa model.Currently, the description given in the ISA model points just to the file, but not inside the file. This is not sufficient, if the file format is not well understood or when the actual data object resulting from a measurement or computation is not a full file, but rather a value or value set in a file.
So we wanted to enhance the data object with two things:
Pointer
pointing to a specific location in the fileDataset
description, which gives context to the data objects stored in a data fileChanges made
Datamodel
We came up with the following data model:
ISA Json
Which results in the following json schema:
ISA Tab
To integrate these model extensions into the ISA Tab Format, we propose two adjustments:
To enable processes to point into files data files, we propose to add a new column
Data Pointer
to theAssay file
. This column should be used to qualify theData File
column, when the data object resulting from the process is not the full data file, but instead a value or value set in the data file.Additionally, to give context about the values in the data file, we propose to add a new file to the isa tab family, namely the
Dataset
file, which carries all other data fields, which we added in theData Model
.Aux
Open Questions