Skip to content

StantonMartin/nmdc-metadata

Repository files navigation

Metadata management for the National Microbiome Data Collaborative

Build Status

The purpose of this repository is to manage metadata for the National Microbiome Data Collaborative (NMDC). The NMDC is a multi-organizational effort to enable integrated microbiome data across diverse areas in medicine, agriculture, bioenergy, and the environment. This integrated platform facilitates comprehensive discovery of and access to multidisciplinary microbiome data in order to unlock new possibilities with microbiome data science.

Tasks managed by the repository are:

Schema

The NMDC schema is used during the translation process to specify how metadata elements are related.

img

Documentation

Documentation for the NMDC schema can be browsed here:

Standardization of characteristics

Entities in the schema are annotated with characteristics. When possible, we use standard terminologies and ontologies to define these characteristics. These standards include:

We are actively involved in updating the MIxS standards (mixs-ng) and creating an RDF version of MIxS (mixs-rdf).

Metadata sources

At present, we ingest metadata from the Joint Genome Institute (JGI) and the Environmental Molecular Sciences Lab (EMSL).

The NMDC schema and translation process will be modified as more metadata sources become available.

Metadata integration

We use Jupyter notebooks to integrate the metadata sources. This allows us to iterate quickly in a transparent and interactive manner as new metadata sources become available.

Development of more comprehensive ETL pipeline will progress as the metadata sources and schema become more concrete.

About

Managing metadata and policy around metadata in NMDC

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published