-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensembl Tark Data Provider #86
Comments
working in branch ensembl_tark |
ok merged into main. Need to start trying with a test set for a while Also need to disable RefSeq due to the gap problem Also need a test for |
Ran TARK using benchmark, 50 out of 50 correct
performance is 0.478 / second vs 3/second with cdot REST, so ~6 times slower Still need the RefSeq check |
Done initial implementation |
@holtgrewe - you use Ensembl right? Maybe you'll be interested in this |
Thanks CC @tedil |
Can use w/no arguments for Ensembl (uses hgvs SeqFetcher for genome sequences) , but for RefSeq or if you want local genome fetching, need to initialise w/special seq fetcher initialised with fasta
|
Andy Yates suggested https://tark.ensembl.org/
This has Ensembl in a format we can use
However, it doesn't have alignments (CIGAR etc) for RefSeq so doesn't handle gaps, have raised issue on project Ensembl/tark#81
So I think we should just do Ensembl to start with
Example:
http://tark.ensembl.org/api/transcript/?stable_id=ENST00000256078&stable_id_version=4&expand_all=true
We can get sequence out via:
Can get out protein -
get_pro_ac_for_tx_ac
:Can implement ``get_tx_for_gene```
http://tark.ensembl.org/api/transcript/search/?identifier_field=KRAS&expand=transcript_release_set%2Cgenes
Can even implement
get_tx_for_region
via eg:http://tark.ensembl.org/api/transcript/?loc_start=25362365&loc_end=25403737&loc_region=12&expand_all=false
The text was updated successfully, but these errors were encountered: