This is a two-stage computational workflow for data-driven virus discovery from sequencing data from the Sequence Read Archive or your own data. Stage 1 (Virushunter) searches the raw reads using profile Hidden Markov Models. Stage 2 (Virusgatherer) perform a seed-based, iterative viral genome assembly that specifically targets the sequences identified in the first stage.
- EMBOSS
- seqtk
- fastp
- NCBI blast
- NCBI SRA toolkit
- HMMer
- Genseed-HMM
- CAP3
- newbler
- Bowtie 2
- snakemake
- vsearch >=2.15.2 <2.20.0
You need to install the following Blast databases and specify their file paths and names in the config.yaml:
- refseq_protein (can be downloaded from https://ftp.ncbi.nlm.nih.gov/blast/db/)
- viral_protein (the sequences can be downloaded from https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/, only accessions are required)
- viral_genomic (the sequences can be downloaded from https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/; use
makeblastdb
command withparse_seqids
to create a BLAST database)
NOTE: to download only RdRp-encoding RNA viruses, the following command can be used: esearch -db nucleotide -query "txid2559587[Organism:exp] AND refseq[filter] NOT txid2732397[Organism:exp]" | efetch -format fasta > riboviria.no_pararnavirae.genomic.fna
- filter (see subfolder 4_databases; use
makeblastdb
command to create a BLAST database)
Note: For detailed instructions on downloading Blast databases, please refer to our GitHub Wiki.
For questions or support, email chris.lauber at twincore.de
Lauber C*, Seitz S*, Mattei S, Suh A, Beck J, Herstein J, Börold J, Salzburger W, Kaderali L, Briggs JAG, Bartenschlager R. Deciphering the Origin and Evolution of Hepatitis B Viruses by Means of a Family of Non-enveloped Fish Viruses. Cell Host Microbe. 2017 Sep 13;22(3):387-399.e6. doi: 10.1016/j.chom.2017.07.019.
Lauber C, Vaas J, Klingler F, Mutz, M, Gorbalenya AE, Bartenschlager F, Seitz S. Deep mining of the Sequence Read Archive reveals bipartite coronavirus genomes and inter-family Spike glycoprotein recombination. bioRxiv. doi: https://doi.org/10.1101/2021.10.20.465146
Lauber C, Chong LC. Viroid-like RNA-dependent RNA polymerase-encoding ambiviruses are abundant in complex fungi. Frontiers Microbiology. 2023 May 12; Volume 14. https://doi.org/10.3389/fmicb.2023.1144003
* equal contribution