Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support NCBI microbe GTF/GFF with no transcripts (CDS only) #1627

Open
wants to merge 7 commits into
base: postreleasefix/113
Choose a base branch
from

Conversation

nuno-agostinho
Copy link
Contributor

@nuno-agostinho nuno-agostinho commented Mar 4, 2024

Fixes #1620

Support NCBI GTF/GFF annotation files that only contain CDS lines: these CDS lines are children from gene IDs (instead of transcript IDs, as usual in Ensembl annotation files) and don't have exons as children.

If a CDS is a child from a gene and has no exons of its own, parse the feature as a single-exon transcript with the same strand, start and end as the CDS.

TODO

  • Support NCBI microbe GTF/GFF annotation
  • Only activate when using option --cds_as_transcript_gxf
  • Fix unit tests
  • Document --cds_as_transcript_gxf in public docs

Testing

Example files for avian paramyxovirus 1

Example VCF

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT iso1
NC_075404.1 980 . T C 12078.64 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=0.924;DP=624;ExcessHet=0.0000;FS=1.120;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=19.87;ReadPosRankSum=0.149;SOR=0.728 GT:AD:DP:GQ:PL 0/1:236,372:608:99:12086,0,6929
NC_075404.1 3666 . C T 15573.64 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=-0.079;DP=770;ExcessHet=0.0000;FS=7.765;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=20.88;ReadPosRankSum=0.795;SOR=0.362 GT:AD:DP:GQ:PL 0/1:235,511:746:99:15581,0,5829
NC_075404.1 3812 . A G 534.64 ReadPosRankSum-8 AC=1;AF=0.500;AN=2;BaseQRankSum=1.096;DP=826;ExcessHet=0.0000;FS=15.515;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=0.66;ReadPosRankSum=-12.298;SOR=2.487 GT:AD:DP:GQ:PL 0/1:722,85:807:99:542,0,23105
NC_075404.1 4631 . T C 1817.64 ReadPosRankSum-8 AC=1;AF=0.500;AN=2;BaseQRankSum=-3.725;DP=846;ExcessHet=0.0000;FS=22.208;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=2.24;ReadPosRankSum=-13.945;SOR=1.685 GT:AD:DP:GQ:PL 0/1:680,133:813:99:1825,0,21905
NC_075404.1 289 . G A 924.64 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=-1.811;DP=720;ExcessHet=0.0000;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=59.97;MQRankSum=0.000;QD=1.50;ReadPosRankSum=-5.861;SOR=0.631 GT:AD:DP:GQ:PL 0/1:531,87:618:99:932,0,16256

Example command

./vep --i sample.vcf \
      --gtf GCF_004786615.1_ASM478661v1_genomic.gtf.gz \
      --fasta GCF_004786615.1_ASM478661v1_genomic.fna.gz

Test conditions

  • Command without --cds_as_transcript_gxf should return a warning if there are CDS in the annotation whose parent is a gene record
  • Command with --cds_as_transcript_gxf should successfully use the CDS in the annotation as single-exon transcripts

@nuno-agostinho nuno-agostinho marked this pull request as ready for review September 5, 2024 12:49
@nuno-agostinho nuno-agostinho changed the base branch from postreleasefix/112 to postreleasefix/113 September 5, 2024 12:49
@nuno-agostinho nuno-agostinho changed the title Support NCBI microbe annotation with no transcripts (CDS only) Support NCBI prokaryotic GTF/GFF with no transcripts (CDS only) Sep 5, 2024
@nuno-agostinho nuno-agostinho changed the title Support NCBI prokaryotic GTF/GFF with no transcripts (CDS only) Support NCBI microbe GTF/GFF with no transcripts (CDS only) Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants