-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sample data #33
Comments
Hi @spficklin - If you grab the tarball of trees: I think that will give you what you wanted; note that these sequences are the unaligned versions, but their IDs should correspond to the leaf node labels in the trees (if they don't let me know- it's possible the tarball hasn't been updated to reflect some fixes in that regard) regarding the organisms, I'm not sure what exactly you'll need but we are using the "gensp." prefixing to denote the species of origin (ie "glyma" => Glycine max, "medtr" => Medicago truncatula, etc.); can give you more detailed list if I know how you plan to handle this (in our case, the loader expects that the annotations have already been loaded and just does a lookup for them) |
Thanks @adf-ncgr . I've gotten back to this. Do you have a lookup table that maps your organism "gensp" prefix to the taxonomic name? I want to import a FASTA file from one I downloaded using the file you mentioned above but I need to know the species that each belongs to. |
Hi @spficklin- there may be a few quirks in the following extraction from our organism table, in particular with some of the non-legume species, but hopefully it will be close enough to give you the relevant info (e.g. you'll probably see easily that Arabidopsis thaliana would be arath in "gensp" representation instead of A. thaliana). Let me know if there's anything in the fasta you grabbed that you can't glean from this, or if you have other questions- thanks for moving it along...
------------------------+--------------+-------------------------- |
This is great. Thanks. I'll let you know how it goes. |
In an effort to write unit testing for the Newick file importer that comes with Tripal, do you have a file that could be shared? We would need the file in newick format, a FASTA file containing all of the gene/protein sequences and the organism to which those FASTA sequences belong.
Thanks much!
The text was updated successfully, but these errors were encountered: