Skip to content

docm field formatting #166

@colleenXu

Description

@colleenXu

when looking at docm data https://myvariant.info/v1/query?q=_exists_:%22docm.pubmed_id%22&field=docm, the pubmed_id value is sometimes a list represented as a string. it appears to be ", "-delimited (both a comma AND a space).

It would be easier to use if it was represented as a list of strings. ex: [ "12460918", "23833300", "12068308", "12460919", "21483012", "19010912", "22649091", "19238210" ]

One example
{
  "_id": "chr7:g.140481393T>C",
  "_score": 1,
  "docm": {
    "aa_change": "p.Y472C",
    "all_domains": "pfam_Ser-Thr/Tyr_kinase_cat_dom,pfam_Prot_kinase_dom,pfam_Raf-like_ras-bd,pfam_Prot_Kinase_C-like_PE/DAG-bd,superfamily_Kinase-like_dom,smart_Raf-like_ras-bd,smart_Prot_Kinase_C-like_PE/DAG-bd,smart_Ser/Thr_dual-sp_kinase_dom,smart_Tyr_kinase_cat_dom,pfscan_Raf-like_ras-bd,pfscan_Prot_Kinase_C-like_PE/DAG-bd,pfscan_Prot_kinase_dom,prints_Ser-Thr/Tyr_kinase_cat_dom,prints_DAG/PE-bd",
    "alt": "C",
    "c_position": "c.1415",
    "chrom": 7,
    "default_gene_name": "BRAF",
    "deletion_substructures": "-",
    "disease": "LC",
    "doid": "DOID:1324",
    "domain": "pfam_Ser-Thr/Tyr_kinase_cat_dom,pfam_Prot_kinase_dom,superfamily_Kinase-like_dom,smart_Ser/Thr_dual-sp_kinase_dom,smart_Tyr_kinase_cat_dom,pfscan_Prot_kinase_dom",
    "ensembl_gene_id": "ENSG00000157764",
    "genename": "BRAF",
    "genename_source": "HGNC",
    "hg19": {
      "end": 140481393,
      "start": 140481393
    },
    "primary": 1,
    "pubmed_id": "12460918, 23833300, 12068308, 12460919, 21483012, 19010912, 22649091, 19238210",
    "ref": "T",
    "source": "MyCancerGenome",
    "strand": -1,
    "transcript_error": "no_errors",
    "transcript_name": "ENST00000288602",
    "transcript_source": "ensembl",
    "transcript_species": "human",
    "transcript_status": "known",
    "transcript_version": "74_37",
    "trv_type": "missense",
    "type": "SNP",
    "ucsc_cons": 1,
    "url": "http://www.mycancergenome.org/content/disease/lung-cancer/braf/209"
  }
}

EDIT: there are other fields that are also a little tricky to parse:

  • docm.source: sometimes it's "-" and it's unclear what this means. other times it's null
  • docm.url: sometimes this field's value is null (rather than not having the field when there's no value)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions