Skip to content

Regression-based annotation of protein-coding sequences from ribosome profiling data

License

Notifications You must be signed in to change notification settings

JieWu2012/ORF-RATER

 
 

Repository files navigation

ORF-RATER

ORF-RATER (Open Reading Frame - Regression Algorithm for Translational Evaluation of Ribosome-protected footprints) comprises a series of scripts for coding sequence annotation based on ribosome profiling data.

The software was created at Jonathan Weissman's lab at UCSF and is described in Fields, Rodriguez, et al., "A regression-based analysis of ribosome-profiling data reveals a conserved complexity to mammalian translation", Molecular Cell 60, 816-827 (2015).

Usage information can be found in the Detailed Protocol included in the paper's supplemental materials, or by running each script with the --help/-h flag.

Required packages include numpy, scipy, pysam, biopython, pandas, tables, scikit-learn, pybedtools, and plastid, all of which are available through PyPI.

Some features require the multiisotonic package, which must be downloaded manually. Multiisotonic additionally requires python-igraph.

Transcripts must be presented in UCSC's BED12 format. The most reliable method I've found to convert from GTF to BED12 involves first converting to genePred format, making use of UCSC's "gtfToGenePred" and "genePredToBed" scripts, which are available here. The full command is gtfToGenePred INPUT_GTFFILE.gtf stdout | genePredToBed stdin OUTPUT_BEDFILE.bed. Similarly, a BED file can be converted to a GTF using the command bedToGenePred INPUT_BEDFILE.bed stdout | genePredToGtf file stdin OUTPUT_GTFFILE.gtf.

Contact Alex Fields for further information or assistance.

About

Regression-based annotation of protein-coding sequences from ribosome profiling data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%