-
Notifications
You must be signed in to change notification settings - Fork 204
Creating QIIME 2 Taxonomic Classifiers
We use the below commands when creating new QIIME2 taxonomic classifiers. These commands are simply based on this QIIME2 tutorial and are listed here for convenience.
This file represents the current commands used to create custom classifiers. This was only done for the ITS classifiers, because the default QIIME 2 classifier works with both 16S and 18S data. To see the previous commands used to generate primer-specific classifiers please see here.
First, the appropriate reference files need to be downloaded, which corresponded to the UNITE (ver8_99_s_04.02.2020) ITS database files (with and without all eukaryotes).
All of the database files (FASTAs and taxonomy tables) need to be imported as QIIME 2 artifacts.
mkdir imported_files
qiime tools import --type 'FeatureData[Sequence]' \
--input-path sh_qiime_release_s_04.02.2020/sh_refs_qiime_ver8_99_s_04.02.2020.fasta \
--output-path imported_files/sh_refs_qiime_ver8_99_s_04.02.2020_ITS.qza
qiime tools import --type 'FeatureData[Taxonomy]' --input-format HeaderlessTSVTaxonomyFormat \
--input-path sh_qiime_release_s_04.02.2020/sh_taxonomy_qiime_ver8_99_s_04.02.2020.txt \
--output-path imported_files/sh_taxonomy_qiime_ver8_99_s_04.02.2020.qza
qiime tools import --type 'FeatureData[Sequence]' \
--input-path sh_qiime_release_s_all_04.02.2020/sh_refs_qiime_ver8_99_s_all_04.02.2020.fasta \
--output-path imported_files/sh_refs_qiime_ver8_99_s_all_04.02.2020_ITS.qza
qiime tools import --type 'FeatureData[Taxonomy]' --input-format HeaderlessTSVTaxonomyFormat \
--input-path sh_qiime_release_s_all_04.02.2020/sh_taxonomy_qiime_ver8_99_s_all_04.02.2020.txt \
--output-path imported_files/sh_taxonomy_qiime_ver8_99_s_all_04.02.2020.qza
Now that the data is imported we can generate the classifiers themselves, which is performed with the below commands. Note the &
at the end of each command to enable them to be run in the background. The ITS classifiers are based on the entire ITS region and that two different classifiers are created based on the UNITE database for either all eukaryotes (classifier_sh_refs_qiime_ver8_99_s_all_04.02.2020_ITS.qza
) or based on just fungi (classifier_sh_refs_qiime_ver8_99_s_04.02.2020_ITS.qza
).
mkdir taxa_classifiers
qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads imported_files/sh_refs_qiime_ver8_99_s_04.02.2020_ITS.qza \
--i-reference-taxonomy imported_files/sh_taxonomy_qiime_ver8_99_s_04.02.2020.qza \
--o-classifier taxa_classifiers/classifier_sh_refs_qiime_ver8_99_s_04.02.2020_ITS.qza &
qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads imported_files/sh_refs_qiime_ver8_99_s_all_04.02.2020_ITS.qza \
--i-reference-taxonomy imported_files/sh_taxonomy_qiime_ver8_99_s_all_04.02.2020.qza \
--o-classifier taxa_classifiers/classifier_sh_refs_qiime_ver8_99_s_all_04.02.2020_ITS.qza &
The taxonomic classifiers are now prepared. It's important that you now run sanity checks on these classifiers to ensure they were created correctly. This is best done by comparing the taxonomic assignments on test input sequences based on these classifiers to the assignments based on an independent approach. I've written a quick pipeline for running these sanity checks specifically for these amplicon regions, which you can see here.
- Please feel free to post a question on the Microbiome Helper google group if you have any issues.
- General comments or inquires about Microbiome Helper can be sent to [email protected].