MK for Variant annotation using VEP
In this MK users can annotate VCF files using Variant Effect Predictor http://www.ensembl.org/info/docs/tools/vep/script/index.html
Its necessary to previously have installed VEP databases. The MK annotation will generate a TSV file having the biological information listed at the end.
For this MK:
- Uploaded_variation : Identifier of uploaded variant
- Location : Location of variant in standard coordinate format (chr:start or chr:start-end)
- Allele : The variant allele used to calculate the consequence
- Gene : Stable ID of affected gene
- Feature : Stable ID of feature
- Feature_type : Type of feature - Transcript, RegulatoryFeature or MotifFeature
- Consequence : Consequence type
- cDNA_position : Relative position of base pair in cDNA sequence
- CDS_position : Relative position of base pair in coding sequence
- Protein_position : Relative position of amino acid in protein
- Amino_acids : Reference and variant amino acids
- Codons : Reference and variant codon sequence
- Existing_variation : Identifier(s) of co-located known variants
- IMPACT : Subjective impact classification of consequence type
- DISTANCE : Shortest distance from variant to transcript
- STRAND : Strand of the feature (1/-1)
- FLAGS : Transcript quality flags
- PICK : Indicates if this consequence has been picked as the most severe
- SYMBOL : Gene symbol (e.g. HGNC)
- SYMBOL_SOURCE : Source of gene symbol
- HGNC_ID : Stable identifer of HGNC gene symbol
- CANONICAL : Indicates if transcript is canonical for this gene
- SWISSPROT : UniProtKB/Swiss-Prot accession
- TREMBL : UniProtKB/TrEMBL accession
- UNIPARC : UniParc accession
- GENE_PHENO : Indicates if gene is associated with a phenotype, disease or trait
- SIFT : SIFT prediction and/or score
- PolyPhen : PolyPhen prediction and/or score
- EXON : Exon number(s) / total
- INTRON : Intron number(s) / total
- HGVSc : HGVS coding sequence name
- HGVSp : HGVS protein sequence name
- HGVS_OFFSET : Indicates by how many bases the HGVS notations for this variant have been shifted
- AF : Frequency of existing variant in 1000 Genomes combined population
- AFR_AF : Frequency of existing variant in 1000 Genomes combined African population
- AMR_AF : Frequency of existing variant in 1000 Genomes combined American population
- EAS_AF : Frequency of existing variant in 1000 Genomes combined East Asian population
- EUR_AF : Frequency of existing variant in 1000 Genomes combined European population
- SAS_AF : Frequency of existing variant in 1000 Genomes combined South Asian population
- ExAC_AF : Frequency of existing variant in ExAC combined population
- ExAC_Adj_AF : Adjusted frequency of existing variant in ExAC combined population
- ExAC_AFR_AF : Frequency of existing variant in ExAC African/American population
- ExAC_AMR_AF : Frequency of existing variant in ExAC American population
- ExAC_EAS_AF : Frequency of existing variant in ExAC East Asian population
- ExAC_FIN_AF : Frequency of existing variant in ExAC Finnish population
- ExAC_NFE_AF : Frequency of existing variant in ExAC Non-Finnish European population
- ExAC_OTH_AF : Frequency of existing variant in ExAC combined other combined populations
- ExAC_SAS_AF : Frequency of existing variant in ExAC South Asian population
- CLIN_SIG : ClinVar clinical significance of the dbSNP variant
- SOMATIC : Somatic status of existing variant
- PHENO : Indicates if existing variant(s) is associated with a phenotype, disease or trait; multiple values correspond to multiple variants
- MOTIF_NAME : The source and identifier of a transcription factor binding profile (TFBP) aligned at this position
- MOTIF_POS : The relative position of the variation in the aligned TFBP
- HIGH_INF_POS : A flag indicating if the variant falls in a high information position of the TFBP
- MOTIF_SCORE_CHANGE : The difference in motif score of the reference and variant sequences for the TFBP