-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lotus v3.0 annotations with extra codons #51
Comments
Taking Lj0g3v0021349.2 as an example, the CDS sequence taken as the concatenation of CDS features from the genome does not match the CDS sequence - the stop codons in the genome sequence have been replaced by other codons in the CDS sequence and the CDS sequence is three base pairs shorter. Again, apologies if I'm doing something wrong at my end. |
@robsyme Thanks for bringing that to our attention. I can confirm that the GFF3 coordinates are indeed incorrect, as far as our additional internal checks goes. We are currently trying to trace the source of the error and will rectify it as soon as possible. Meanwhile, the CDS data in FASTA files (accessible via the SeqRet toolkit and downloadable here (name: Lotus japonicus v3.0 CDS)) are known to be correct, and I checked a few sequences there and they seem to be correct. You might want to use these sequences instead of those inferred from the GFF3 coordinates for now. |
Thanks Terry For our analyses, we need to know the genomic position of the coding sequence, so we might need to wait until the GFF is fixed. Thanks though! I can supply a list of affected loci if that would be helpful. |
@robsyme Hi Robert, if you can provide a list of affected loci, that will be extremely helpful. Can you send it to my work email, at [email protected]? Many thanks. We are currently performing an internal data audit and checking old logs (the file was last generated in 2014), to see what could've went wrong. |
Submitter: Robert Syme
Email: [email protected]
The v3.0 annotations (gff) contain 9835 annotations that have an extra codon included after the stop codon. For example, the protein translated from Lj0g3v0000709.1 is encoded in the gff file like so:
The CDS is separated across two exons (388 bp and 14 bp) for a total of 402 bp, or 134 aa. When translated, the 134 amino acids are:
Is the extra amino acid deliberatly included in the CDS feature? Should these 9835 proteins with a similar extra codon be included in comparative analysis?
Similarly, there seem to be a number of CDS feaures that contain premature stop codons. For example:
Which translates to the protein:
Are these genes to be translated with an alternative codon table?
Sorry to bother you, and I hope that I've not misunderstood the annotation gff.
The text was updated successfully, but these errors were encountered: