Skip to content

Babel output formats

Gaurav Vaidya edited this page Oct 18, 2022 · 1 revision

Compendia

Files in the compendia directory are JSONL files, where each line is a JSON object in the format:

{
  "type": "biolink:Disease",
  "ic": "84.762172188404165",
  "identifiers": [
    {
      "i": "MONDO:0004979",
      "l": "asthma"
    },
    {
      "i": "DOID:2841",
      "l": "asthma"
    },
    {
      "i": "OMIM:600807"
    },
    {
      "i": "EFO:0000270"
    },
    {
      "i": "UMLS:C0004096",
      "l": "Asthma"
    },
    {
      "i": "UMLS:C0085129",
      "l": "Bronchial Hyperreactivity"
    },
    {
      "i": "UMLS:C0340062",
      "l": "Hyperreactive airway disease"
    },
    {
      "i": "UMLS:C1833269",
      "l": "ASTHMA, PROTECTION AGAINST"
    },
    {
      "i": "UMLS:C1833270",
      "l": "ASTHMA, DIMINISHED RESPONSE TO ANTILEUKOTRIENE TREATMENT IN"
    },
    {
      "i": "UMLS:C1869116",
      "l": "ASTHMA, SUSCEPTIBILITY TO (finding)"
    },
    {
      "i": "UMLS:C3714497",
      "l": "Reactive airway disease"
    },
    {
      "i": "MESH:D001249",
      "l": "Asthma"
    },
    {
      "i": "MESH:D016535",
      "l": "Bronchial Hyperreactivity"
    }
  ]
}

(This has been simplified from the actual entry, and has been pretty-printed -- in the compendium file, it will exist on a single line.)

This entry describes a concept with a preferred identifier MONDO:0004979 and preferred label asthma of type biolink:Disease and an information content of 84.762172188404165 (on a scale from 0 to 100), as well as several alternate identifiers and labels. The ordering of the labels should be in the identifier prefix order in biolink:Disease.

Conflation

Files in the conflation directory are JSONL files, where each line is a JSON object in the format:

["NCBIGene:58061015", "UniProtKB:A0A269XIE9", "UniProtKB:A0A2T4Q0C3", "UniProtKB:A0A854CV56"]

Which means that these identifiers can be conflated together in conflation mode (i.e. when the conflation flag to the Node Normalizer is set to true). At the moment, we only have a GeneProtein conflation, although we may add more conflations in the future.

Synonyms

Files in the synonyms directory are tab-delimited files in the format:

UMLS:C5074680	http://www.geneontology.org/formats/oboInOwl#hasExactSynonym	Hibecovirus

Which means that "Hibecovirus" is an exact synonym of UMLS:C5074680. In theory other relationships are supported, although at the moment I think Babel only generates oboInOwl:hasExactSynonym relationships.

Clone this wiki locally