Remove chimeric reads

Our script chimeraFilter.pl wraps usearch (v6.1), specifically the uchime algorithm, to remove chimeric reads.

Here is an example command:

chimeraFilter.pl -type 1 -db /usr/local/db/single_strand/Bacteria_RDP_trainset15_092015.udb fasta_files/*

Where "-type 1" means that any reads clearly called as chimeric AND reads that are ambiguous are filtered out.

Note that a DB file needs to be input as well. If you'd like to use the UDB format rather than FASTA then you'll need to use the "-makeudb_usearch" function of usearch v6.1 (the same usearch version as used for chimera checking).

Note that it is possible that the settings of "mindiv" and "minh" (see http://www.drive5.com/usearch/manual/UCHIME_score.html) could have significant effects on results. However, so far we have found that small adjustments in these parameters has only a minor effect on sensitivity and specificity when running chimera checking for 16S sequences.

You can download the DB used in the above example [here] (https://www.dropbox.com/s/8qr42doaez48oc3/Bacteria_RDP_trainset15_092015.udb?dl=0) (70 MB), which is originally from the Ribosome Database Project (RDP) and then parsed to include on bacteria.

Options:

-h, --help
Displays the entire help documentation.
-v, --version
Displays version number and exits.
-type <[0|1]>
Non-chimeric output type, either only sequences that are clearly non-chimeric (1) or all sequences that are not called as chimeric ( 0 - includes borderline sequences, "?" in uchime output).
-mindiv
Min % divergence between query and target sequence (default 1.5, note that this differs from the uchime default of 0.8).
-minh
Min score to be called as chimeric (default 0.2, note that this differs from the uchime default of 0.28).
-o, --out_dir
Output directory for filtered fastq files. Default is "non_chimeras".
-thread <# of CPUs>
Using this option without a value will use all CPUs on machine, while giving it a value will limit to that many CPUs. Without option only one CPU is used.
-log
The location to write the log file.
-db, --database
Database of 16S sequences to use as a reference (UDB or FASTA file).

Contact

Please feel free to post a question on the Microbiome Helper google group if you have any issues.
General comments or inquires about Microbiome Helper can be sent to [email protected].

Useful Links

Main SOPs

Amplicon SOP v2 (qiime2-amplicon-2024.5)

PacBio Amplicon SOP v2 (qiime2-2022.2)

Metagenomics SOP v3

Wet-Lab SOPs on Protocols.io

Old SOPs

Tutorials

Microbiome for beginners

Metagenomics Resources

mSystems paper data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove chimeric reads

Contact

Clone this wiki locally