Skip to content

Extract allele frequency data from 1000G VCFs #5

@grosscol

Description

@grosscol

Create a new workflow for allelic frequency information. The current AF data comes along with the VEP process due to the --af flag.

The allelic frequency information from the VEP output appears to be incomplete. E.g. 1-55063514-G-A should have AF data, but it does not appear to be present.

  1. Download VCFs for 1000G on GRCh38 into reference data storage: https://www.internationalgenome.org/data-portal/data-collection/grch38
  2. Extract AF and *_AF fields. SNV ids are pos-ref-alt
  3. Convert to Mongo's bson format for use with mongoimport.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions