-
Notifications
You must be signed in to change notification settings - Fork 204
Remove chimeric reads
Our script chimeraFilter.pl wraps usearch (v6.1), specifically the uchime algorithm, to remove chimeric reads.
Here is an example command:
chimeraFilter.pl -type 1 -db /usr/local/db/single_strand/Bacteria_RDP_trainset15_092015.udb fasta_files/*
Where "-type 1" means that any reads clearly called as chimeric AND reads that are ambiguous are filtered out.
Note that a DB file needs to be input as well. If you'd like to use the UDB format rather than FASTA then you'll need to use the "-makeudb_usearch" function of usearch v6.1 (the same usearch version as used for chimera checking).
Note that it is possible that the settings of "mindiv" and "minh" (see http://www.drive5.com/usearch/manual/UCHIME_score.html) could have significant effects on results. However, so far we have found that small adjustments in these parameters has only a minor effect on sensitivity and specificity when running chimera checking for 16S sequences.
You can download the DB used in the above example [here] (https://www.dropbox.com/s/8qr42doaez48oc3/Bacteria_RDP_trainset15_092015.udb?dl=0) (70 MB), which is originally from the Ribosome Database Project (RDP) and then parsed to include on bacteria.
Options:
-
-h, --help
Displays the entire help documentation. -
-v, --version
Displays version number and exits. -
-type <[0|1]>
Non-chimeric output type, either only sequences that are clearly non-chimeric (1) or all sequences that are not called as chimeric ( 0 - includes borderline sequences, "?" in uchime output). -
-mindiv
Min % divergence between query and target sequence (default 1.5, note that this differs from the uchime default of 0.8). -
-minh
Min score to be called as chimeric (default 0.2, note that this differs from the uchime default of 0.28). -
-o, --out_dir
Output directory for filtered fastq files. Default is "non_chimeras". -
-thread <# of CPUs>
Using this option without a value will use all CPUs on machine, while giving it a value will limit to that many CPUs. Without option only one CPU is used. -
-log
The location to write the log file. -
-db, --database
Database of 16S sequences to use as a reference (UDB or FASTA file).
- Please feel free to post a question on the Microbiome Helper google group if you have any issues.
- General comments or inquires about Microbiome Helper can be sent to [email protected].