Skip to content

Latest commit

 

History

History

schema

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Schema

The NMDC schema specifies how metadata elements are related. The main elements are:

  • Study: Summarizes the overall goal of a research initiative and outlines the key objective of its underlying projects.
  • Biosample: A material sample. It may be environmental (encompassing many organisms) or isolate or tissue.
    An environmental sample containing genetic material from multiple individuals is commonly referred to as a biosample.
  • Characteristic: A characteristic of a biosample.
    Examples: depth, habitat, material. For NMDC, characteristics SHOULD be mapped to terms within a MIxS template
  • Omics processing: The methods and processes used to generate omics data from a biosample or organism. Examples of outputs include samples cultivated from another sample or data objects created by instruments runs.
  • Data object: An object that primarily consists of symbols that represent information.
    Files, records, and omics data are examples of data objects.

During the translation process biosamples are annotated with characteristics that specify such things as where, when, or how the sample was collected.

img

Schema management

The NMDC schema is developed using the Biolink modeling language (BiolinkML). BiolinkML is a general purpose modeling language following object-oriented and ontological principles. Models are authored in YAML, and a variety of artifacts can be generated from the model, such as ShEx, JSON-Schema, OWL, Python dataclasses, UML diagrams, and Markdown pages for deployment in a GitHub pages site.

Using BiolinkML, we define high-level entities to represent the data we are integrating. These entities include biosamples (specific portions of material collected from a site), biosample processing (e.g., sequencing performed on a biosample), data objects (e.g., a fastq file produced from a sequencing run), and annotations that specify characteristics of biosamples (e.g., the temperature and elevation of the site where the sample was collected).

nmdc.yaml is the source file for the NMDC schema.

Documentation

BiolinkML generates a set of markdown files. These markdown files are then deployed as GitHub pages to:

This workflow allows us to easily modify the the schema and deploy documentation as part of the build process.