is a computational pipeline designed to predict the presence of open reading frames that are burdensome due to substantial ribosome sequestration on plasmids, particularly virus infectious clones. CryptKeeper
also uses other prediction tools, such as promoter prediction and terminator prediction, to provide additional context. Burden from strongly expressed, long open reading frames have been found to render plasmids evolutionarily unstable or unclonable. In addition to highlighting burden from ribosome absorption, we've also found that CryptKeeper
is capable of displaying alternative translation initiation sites that lead to protein truncations. This is an important consideration for experiments that rely on protein fusions or tagging.
provides an output in the form of CSV files (for downstream data processing) and a Bokeh plot for interactive visualization. The Bokeh plot uses the SVG backend and can be saved to support assembling downstream figures.
Above is an example of a Bokeh plot produced by
. The two outer tracks display predicted RBS strength versus sequence position (the product being 'burden'). The inner tracks display predicted promoters (green), rho-dependent terminators (red), and rho-independent terminators (purple). The innermost track displays annotations extracted from the provited genbank file.
is a Python module and associated command line script. We recommend installing CryptKeeper
using Bioconda on Linux or macOS. This will automatically install CryptKeeper
and all of its dependencies.
From Bioconda (recommended; Linux, macOS):
- Run
conda install -c bioconda cryptkeeper
From Pip (for experts; Linux, macOS, Windows):
- Download and install ViennaRNA, following the instructions here.
- Download and install TranstermHP here. Add TransTermHP binaries to your path.
- Run
pip install cryptkeeper
Developers may consider downloading and installing CryptKeeper
other dependencies by forking their respective repositories and installing them from their repository directory.
cryptkeeper -h
usage: cryptkeeper [-h] -i I [-c] -o O [-j J] [-p] [-n NAME] [--rbs-score-cutoff RBS_SCORE_CUTOFF] [-t TICK_FREQUENCY]
Pipeline for predicting cryptic gene expression
optional arguments:
-h, --help show this help message and exit
-i I, --input I input fasta file
-c, --circular The input file is circular. (Note: Increases runtime)
-o O, --output O output file prefix
-j J, -threads J Number of threads/processes to use
-p, --plot-only plot mode, assumes output files all exist
-n NAME, --name NAME name of sample (if not provided the filename is used)
--rbs-score-cutoff RBS_SCORE_CUTOFF
Minimum score that is graphed and output to final files (all are used in calculating burden)
-t TICK_FREQUENCY Y axis tick frequency (default 1000)
For example:
cd examples/BPMV
cryptkeeper -i pSMART-LCKan-BPMV1.fna -o output/pSMART-LCKan-BPMV1 -j 8 -c
In certain situations, it may be valuable to use CryptKeeper
as a python dependency for another pipeline.
The primary entry point for python development is cryptkeeper.cryptkeeper()
, which has the following arguments:
input_file : str
Path to the input sequence file in FASTA format.
output : str, optional
Path to the output file directory. If not provided, CryptKeeper will write the results to a temporary directory.
circular : bool, optional
If True, CryptKeeper will extend the sequence by a few bases while making predictions. This is necessary for predicting terminators and RBS sites at the index and end of a file. Defaults to False.
name : str, optional
Name of the sequence. If not provided, CryptKeeper will attempt to extract the name from the genbank file if applicable.
threads : int, optional
Number of threads to use for parallel processing. Defaults to 1.
logger : logging.Logger, optional
Logger object to use for logging. If not provided, CryptKeeper will run without logging.
rbs_score_cutoff : float, optional
Minimum score required for an RBS site to be considered expressed. Default is 2.0.
This function returns an object that contains predictions made by CryptKeeper
as well as some metadata information extracted directly from an input GenBank file if one is provided. The object has the following attributes:
: the name of the sample in the form of astr
: the input sequence from the original file in the form of astr
: ORFs with predicted RBSs in the form of aList[NamedTuple]
: Predicted rho dependant termionators in the form of aList[NamedTuple]
: Predicted intrinsic terminators in the form of aList[NamedTuple]
: Predicted promoters in the form of aList[NamedTuple]
: Annotations from the provided genbank file in the form of a List[NamedTuple]burden
: The total predicted burden in the form of afloat