Data object extension #15

HLWeil · 2023-02-28T22:10:28Z

Preface

Hey, here are our proposed adjustments to the datamodel and documentation for enabling ISA to thoroughly describe data objects.
For reference, here is the discussion about this topic: ISA-tools/isa-api#484

General Goals

Our goal here is to improve the description of data using the isa model.

Currently, the description given in the ISA model points just to the file, but not inside the file. This is not sufficient, if the file format is not well understood or when the actual data object resulting from a measurement or computation is not a full file, but rather a value or value set in a file.

So we wanted to enhance the data object with two things:

A Pointer pointing to a specific location in the file
A Dataset description, which gives context to the data objects stored in a data file

Changes made

Datamodel

We came up with the following data model:

Property	Datatype	Description
File name	String	A file name or full path referencing a data file produced by the related process that MAY be packaged with, or is accessible via, the ISA reference implementation content.
Pointer	String	A pointer referencing a location inside the data file. This SHOULD always be specified when the data of interest is not the complete file, but a specific part of it.
Generated By	String	A file name, full path or identifier referencing the tool with which this data object was generated.
Explication	Ontology Annotation	An ontology annotation qualifying what the data describes.
Unit	Ontology Annotation	The unit qualifying the value stored in the data object.
Object Type	Ontology Annotation	Specifies the format in which the value in the data object will be stored.

ISA Json

Which results in the following json schema:

{
    "$schema": "http://json-schema.org/draft-04/schema",
    "title": "ISA data schema",
    "description": "JSON-schema representing a data file in the ISA model",
    "description": "JSON-schema representing a data object in the ISA model",
    "type": "object",
    "properties": {
        "@id": { "type": "string", "format": "uri" },
        "name": {
        "filename": {
            "type": "string"
        },
        "pointer": {
            "type": "string"
        },
        "type": {
            "type": "string",
            "enum": [
                "Raw Data File",
                "Derived Data File",
                "Image File"
            ]
        },
        "generatedBy": {
            "type": "string"
        },
        "explication": {
            "$ref": "ontology_annotation_schema.json#"
        },
        "unit": {
            "$ref": "ontology_annotation_schema.json#"
        },
        "objectType": {
            "$ref": "ontology_annotation_schema.json#"
        },
        "label": {
            "type": "string"
        },
        "comments" : {
            "type": "array",
            "items": {
                "$ref": "comment_schema.json#"
            }
        }
    },
    "additionalProperties": false
}

ISA Tab

To integrate these model extensions into the ISA Tab Format, we propose two adjustments:

To enable processes to point into files data files, we propose to add a new column Data Pointer to the Assay file. This column should be used to qualify the Data File column, when the data object resulting from the process is not the full data file, but instead a value or value set in the data file.

Additionally, to give context about the values in the data file, we propose to add a new file to the isa tab family, namely the Dataset file, which carries all other data fields, which we added in the Data Model.

Aux

Small fix to sphinx config file as one function was deprecated.

Open Questions

Should we add the build folder here in this PR?

Data node extension

stain · 2023-03-09T18:04:39Z

See also https://www.w3.org/TR/annotation-model/#selectors on how fragment selectors are different for different media types. You need to indicate the type of pointer, either as a prefix or pointertype. The media type of filename will then also be essential (equivalent to encodingFormat in RO-Crate for IANA Media type) so the client can know how to resolve the pointer.

muehlhaus · 2023-03-09T18:22:01Z

I agree with Stain! We need a pointerType and encodingFormat

HLWeil added 7 commits February 22, 2023 23:27

rework data.json object by adding pointer and descriptor fields

0848968

fix sphinx config

c64fa4b

add data object extensions to model and json

9d186e3

add data object extensions to isatab documentation

e6fdcc4

add dataset example file

4371c44

rename attribute to explication and make object type an ontology term

98cfc84

Merge pull request #1 from HLWeil/dataset

c948c52

Data node extension

Freymaurer mentioned this pull request Jun 6, 2023

[Design] Enable Pointer logic for input/output nfdi4plants/ARC-specification#80

Closed

HLWeil mentioned this pull request Jan 24, 2024

Rework Data Nodes nfdi4plants/ARC-specification#93

Merged

HLWeil mentioned this pull request May 2, 2024

Datamap specification nfdi4plants/ARC-specification#104

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data object extension #15

Data object extension #15

HLWeil commented Feb 28, 2023

stain commented Mar 9, 2023 •

edited

Loading

muehlhaus commented Mar 9, 2023

Data object extension #15

Are you sure you want to change the base?

Data object extension #15

Conversation

HLWeil commented Feb 28, 2023

Preface

General Goals

Changes made

Datamodel

ISA Json

ISA Tab

Aux

Open Questions

stain commented Mar 9, 2023 • edited Loading

muehlhaus commented Mar 9, 2023

stain commented Mar 9, 2023 •

edited

Loading