Skip to content

Amplicon SOP v2 (qiime2‐2024.5)

Robyn Wright edited this page Dec 2, 2024 · 4 revisions

This standard operating procedure (SOP) is based on the QIIME 2 tutorials and is meant for internal use and users that want to quickly run through their data using our procedure. There are more detailed descriptions and tutorials on the QIIME 2 website, which we recommend you check out! We also have tutorials associated with microbiome analysis workshops that we have run in the side bar on the right hand side of the page.

If you use this workflow make sure to keep track of the commands you use locally as this page will be updated over time (see "revisions" above for earlier versions).

Requirements

This workflow assumes that you have QIIME 2 (version qiime2-amplicon-2024.5) installed in a conda environment and that you also have FASTQC and MultiQC installed in this environment.

This workflow also assumes that the input is raw paired-end MiSeq data in demultiplexed FASTQ format located within a folder called raw_data. The filenames are expected to look like this: 105CHE6WT_S325_L001_R1_001.fastq.gz, where each field separated by an _ character corresponds to:

  1. The sample name (105CHE6WT)
  2. The sample number on the run (S325)
  3. The lane number (L001)
  4. The read number (R1 for forward and R2 for reverse)
  5. The set number (001)

Often you will need to re-name your files to match this syntax. However, QIIME 2 accepts many different formats so if your files are not already in this format (e.g. not demultiplexed) you would just need to use slightly different commands for importing your data.

Virtual Box

We have previously made a Virtual Box for our QIIME2 amplicon SOP. This hasn't been updated for quite some time now (it uses QIIME2 2018.6) and we therefore recommend that you use the QIIME2 virtual machine if this is something that you want to use. You can still find details on using our QIIME2-2018.6 Virtual Box here.

1. First steps

1.1 Format metadata file

You can format the metadata file (which is in a compatible format to the QIIME 1 mapping files) in a spreadsheet and validate the format using a google spreadsheet plugin as described on the QIIME 2 website. The only required column is the sample id column, which should be first. All other columns should correspond to sample metadata. Below, we assume that your metadata filename has been assigned to a bash variable called "$METADATA".

This can be done like so (for example if your file is called metadata.txt and is found in the folder /home/user):

METADATA="/home/user/metadata.txt"

1.2 Set number of cores

Several commands throughout this workflow can run on multiple cores in parallel. How many cores to use in these cases will be saved to the NCORES variable defined below. We set this variable to 1 below, but you can change this to be however many cores you would like to use.

NCORES=1

1.3 Inspect read quality

Visualize sequence quality across raw reads. This is important as a sanity check that your reads are of reasonable quality and to determine how your reads should be trimmed in downstream steps. QIIME 2 comes with a plugin for visualizing read quality, which we will use at a later step. However, when dealing with raw reads the easiest method to use is a combination of FASTQC and MultiQC. Note that these tools are not packaged with QIIME 2 so you will need to install them separately.

This is an important step for identifying outlier samples with especially low quality, read sizes, read depth, and other metrics.

You can run FASTQC with this command (after creating the output directory).

mkdir fastqc_out
fastqc -t $NCORES raw_data/*.fastq.gz -o fastqc_out

If you receive the error Value "FASTQ" invalid for option threads (number expected) (where "FASTQ" is an input filename) then make sure you have defined the NCORES variable correctly and re-run the command.

FASTQC generates a report for each individual file. To aggregate the summary files into a single report we can run MultiQC with these commands:

multiqc fastqc_out --filename multiqc_report.html

The full report is found within multiqc_report.html. You can view this report in a web-browser on your local computer. The most important reason to visualize this report is to ensure that your samples are of high-quality (based largely on whether the per-base quality is >30 across most of the reads) and that there are no outlier samples.

1.4 Activate QIIME 2 conda environment

You should run the rest of the workflow in a conda environment, which makes sure the correct version of the Python packages required by QIIME 2 are being used. You can activate this conda environment with this command (you may need to swap in source for conda if you get an error):

conda activate qiime2-amplicon-2024.5

1.5 Import FASTQs as QIIME 2 artifact

To standardize QIIME 2 analyses and to keep track of provenance (i.e. a list of what commands were previously run to produce a file) a special format is used for all QIIME 2 input and output files called an "artifact" (with the extension QZA). The first step is to import the raw reads as a QZA file.

mkdir reads_qza
    
qiime tools import \
   --type SampleData[PairedEndSequencesWithQuality] \
   --input-path raw_data/ \
   --output-path reads_qza/reads.qza \
   --input-format CasavaOneEightSingleLanePerSampleDirFmt

All of the FASTQs are now in the single artifact file reads_qza/reads.qza. This file format can be a little confusing at first, but it is actually just a zipped folder. You can manipulate and explore these files better with the qiime tools utilities (e.g. peek and view).

1.6 Trim primers with cutadapt

Screen out reads that do not begin with primer sequence and remove primer sequence from reads using the cutadapt QIIME 2 plugin. The below primers correspond to the 16S V6-V8 region (Bacteria-specific primer set).

qiime cutadapt trim-paired \
   --i-demultiplexed-sequences reads_qza/reads.qza \
   --p-cores $NCORES \
   --p-front-f ACGCGHNRAACCTTACC \
   --p-front-r ACGGGCRGTGWGTRCAA \
   --p-discard-untrimmed \
   --p-no-indels \
   --o-trimmed-sequences reads_qza/reads_trimmed.qza

If you received the error --p-cores option requires an argument make sure that you set the NCORES bash variable as described above.

If using 16S V4/V5 primers, use these options in the above command:

   --p-front-f GTGYCAGCMGCCGCGGTAA \
   --p-front-r CCGYCAATTYMTTTRAGTTT \

If using 16S V6/V8 Archaeal primers, use these options in the above command:

   --p-front-f TYAATYGGANTCAACRCC \
   --p-front-r CRGTGWGTRCAAGGRGCA \

If using 18S V4 primers, use these options in the above command:

   --p-front-f CYGCGGTAATTCCAGCTC \
   --p-front-r AYGGTATCTRATCRTCTTYG \

If using Fungal ITS2 primers, use these options in the above command:

   --p-front-f GTGAATCATCGAATCTTTGAA \
   --p-front-r TCCTCCGCTTATTGATATGC \

1.7 Summarize trimmed FASTQs

You can run the demux summarize command after trimming the reads to get a report of the number of reads per sample and quality distribution across the reads. This generates a more basic output compared to FASTQC/MultiQC, but is sufficient for this step.

qiime demux summarize \
   --i-data reads_qza/reads_trimmed.qza \
   --o-visualization reads_qza/reads_trimmed_summary.qzv

Note that we gave the output file above the extension .qzv since this a special type of artifact file - a visualization. You can look at the visualization by uploading the file to the QIIME2 view website and clicking on the Interactive Quality Plot tab at the top of the page

2. Denoising the reads into amplicon sequence variants

At this stage the main 2 pipelines you can use are based on either deblur or DADA2. Below we will describe the commands for running deblur. See here if you are interested in running DADA2.

2.1 Join paired-end reads

Currently deblur doesn't support unjoined paired-end reads. Forward and reverse reads can be joined with VSEARCH as shown below. Note that this command is more stringent than other approaches like PEAR when joining reads. If a low proportion of reads are joined at this step then one solution would be to export the data from QIIME2, run PEAR (see the run_pear.pl script that is part of this repository), and then re-import your data. In order to avoid those extra export/import manipulations, another solution is to restart from the raw reads, replacing the initial steps here until the Deblur/DADA2 step below - see our instructions for Alternative Processing of Reads with PEAR.

qiime vsearch merge-pairs \
   --i-demultiplexed-seqs reads_qza/reads_trimmed.qza \
   --o-merged-sequences reads_qza/reads_trimmed_joined.qza \
   --o-unmerged-sequences reads_qza/reads_trimmed_unjoined.qza

2.2 Filter out low-quality reads

This command will filter out low-quality reads based on the default options.

qiime quality-filter q-score \
   --i-demux reads_qza/reads_trimmed_joined.qza \
   --o-filter-stats filt_stats.qza \
   --o-filtered-sequences reads_qza/reads_trimmed_joined_filt.qza

It is a good idea at this point just to verify that there haven't been any substantial losses of reads, before going through the whole ASV process, at either the joining or quality-filtering steps above:

qiime demux summarize \
   --i-data reads_qza/reads_trimmed_joined_filt.qza \
   --o-visualization reads_qza/reads_trimmed_joined_filt_summary.qzv

2.3 Running deblur

Run the deblur workflow to correct reads and get amplicon sequence variants (ASVs). Note that the below command will retain singletons, which would have been filtered out unless we set --p-min-reads 1, and is for 16S sequences only. For other amplicon regions, you can either use the denoise-other option in the command and specify a reference database of sequences to use for positive filtering (as in the below versions for 18S and ITS) or use DADA2 as described below (the former still being preferred due to speed considerations).

Note that you will need to trim all sequences to the same length with the --p-trim-length option if you get an error. In order to determine the correct length to trim down to, take a look at the summary visualization generated above in order to select a length to trim back to that maintains the largest/acceptable quantity of reads. Finally, input that length in the deblur option --p-trim-length below and rerun the command.

qiime deblur denoise-16S \
   --i-demultiplexed-seqs reads_qza/reads_trimmed_joined_filt.qza \
   --p-trim-length -1 \
   --p-sample-stats \
   --p-jobs-to-start $NCORES \
   --p-min-reads 1 \
   --output-dir deblur_output

When running 18S and ITS data with deblur you will need to specify custom reference files, which can be downloaded here.

For the 18S version (note the first command imports the deblur database and only needs to be run once on your system):

qiime tools import \
   --input-path /home/shared/rRNA_db/gb203_pr2_all_10_28_99p_clean_prob-rm.fasta \
   --output-path /home/shared/rRNA_db/gb203_pr2_all_10_28_99p_clean_prob-rm.qza \
   --type 'FeatureData[Sequence]'

qiime deblur denoise-other \
   --i-demultiplexed-seqs reads_qza/reads_trimmed_joined_filt.qza \
   --i-reference-seqs /home/shared/rRNA_db/gb203_pr2_all_10_28_99p_clean_prob-rm.qza \
   --p-trim-length -1 \
   --p-sample-stats \
   --p-jobs-to-start $NCORES \
   --p-min-reads 1 \
   --output-dir deblur_output

For the ITS version (note the first command imports the deblur database and only needs to be run once on your system):

qiime tools import \
   --input-path /home/shared/rRNA_db/UNITE_sh_refs_qiime_ver8_99_s_all_02.02.2019.fasta \
   --output-path /home/shared/rRNA_db/UNITE_sh_refs_qiime_ver8_99_s_all_02.02.2019.qza \
   --type 'FeatureData[Sequence]'

qiime deblur denoise-other \
   --i-demultiplexed-seqs reads_qza/reads_trimmed_joined_filt.qza \
   --i-reference-seqs /home/shared/rRNA_db/UNITE_sh_refs_qiime_ver8_99_s_all_02.02.2019.qza \
   --p-trim-length -1 \
   --p-sample-stats \
   --p-jobs-to-start $NCORES \
   --p-min-reads 1 \
   --output-dir deblur_output

2.4 Summarizing deblur output

Once a denoising pipeline has been run you can summarize the output table with the below command, which will create a visualization artifact for you to view.

qiime feature-table summarize \
   --i-table deblur_output/table.qza \
   --o-visualization deblur_output/deblur_table_summary.qzv

We will use this visualization later to determine the the cut-offs for filtering the table below, but for now you should mainly take a look at the visualization to ensure that sufficient reads have been retained after running deblur. This denoising tool filters out reads that either do match to known noise or that do not match with low similarity to the expected amplicon region. If your samples have very low depth after running deblur (compared to the input read depth) this could be a red flag that either you ran the tool incorrectly, you have a lot of noise in your data, or that deblur is inappropriate for your dataset.

3. Assign taxonomy to ASVs

You can assign taxonomy to your ASVs using a Naive-Bayes approach implemented in the scikit learn Python library and the SILVA or UNITE databases.

3.1 Build or acquire taxonomic classifier

This approach requires that a classifier be trained in advance on a reference database. We recommend users use a widely used classifier to help ensure there are no unexpected issues with the Naive-Bayes model. We previously maintained primer-specific classifiers, which theoretically can provide more accurate classifications, but we no longer do this due to concerns regarding issues with the trained models that are difficult to catch if only a couple people are running them. No matter what approach you use, it's a good idea to run a few sanity checks on the output to make sure it worked correctly for your data (see below).

The full-length 16S/18S classifier can be downloaded from the QIIME 2 website (silva-138-99-nb-classifier.qza for the latest classifier). Note that the QIIME2 website now provides classifiers for several different databases: SILVA, Genome Taxonomy Database (GTDB) or Greengenes2 being the main ones.

Custom classifiers for the ITS region that we have generated from the UNITE database are available as well (see downloads and commands used to create these files):

  • Full ITS - fungi only (classifier_fungi_ITS_sh_taxonomy_sh_taxonomy_qiime_ver10_99_04.04.2024_dev.qza)
  • Full ITS - all eukaryotes (classifier_alleuk_ITS_sh_refs_qiime_ver10_99_all_04.04.2024_dev.qza)

There are other databases that you may consider using and creating classifiers for, for example, the Human Oral Microbiome Database.

3.2 Run taxonomic classification

You can run the taxonomic classification with this command, which is one of the longest running and most memory-intensive command of the SOP. If you receive an error related to insufficient memory (and if you cannot increase your memory usage) then you can look into the --p-reads-per-batch option and set this to be lower than the default (which is dynamic depending on sample depth and the number of threads) and also try running the command with fewer jobs (e.g. set --p-n-jobs 1).

qiime feature-classifier classify-sklearn \
   --i-reads deblur_output/representative_sequences.qza \
   --i-classifier /home/shared/taxa_classifiers/qiime2-2022.11_classifiers/silva-138-99-nb-classifier.qza \
   --p-n-jobs $NCORES \
   --output-dir taxa

As with all QZA files, you can export the output file to take a look at the classifications and confidence scores - this needs to be done in this case, as we use the exported taxonomy.tsv file later on when exporting all the final analysis files:

qiime tools export \
   --input-path taxa/classification.qza --output-path taxa

3.3 Assess subset of taxonomic assignments with BLAST

The performance of the taxonomic classification is difficult to assess without a gold-standard reference, but nonetheless one basic sanity check is to compare the taxonomic assignments with the top BLASTn hits for certain ASVs.

It is simple to do this with QIIME 2 by running:

qiime feature-table tabulate-seqs --i-data deblur_output/representative_sequences.qza \
                                  --o-visualization deblur_output/representative_sequences.qzv

The file deblur_output/representative_sequences.qzv is a QIIME 2 visualization file that you can open in the QIIME 2 viewer. The format makes it easy to BLAST certain ASVs against the NCBI nt database. By comparing these BLAST hits with the taxonomic assignment of ASVs generated above you can reassure yourself that the taxonomic assignments overall worked correctly. It's a good idea to select ~5 ASVs to BLAST for this validation, which should be from taxonomically different groups, such as different phyla, according to the taxonomic classifier.

4. Filtering resultant table

Filtering the denoised table is an important step of microbiome data analysis. You can see more details on this process in the QIIME 2 filtering tutorial.

4.1 Filter out rare ASVs

Based on the summary visualization created in step 2.4 above you can choose a cut-off for how frequent a variant needs to be (and optionally how many samples need to have the variant) for it to be retained. One possible choice would be to remove all ASVs that have a frequency of less than 0.1% of the mean sample depth. This cut-off excludes ASVs that are likely due to MiSeq bleed-through between runs (reported by Illumina to be 0.1% of reads). To calculate this cut-off you would identify the mean sample depth in the visualization created in step 2.4, multiply it by 0.001, and round to the nearest integer.

Once you've determined how you would like to filter your table you can do so with this command (X is a placeholder for your choice):

qiime feature-table filter-features \
   --i-table deblur_output/table.qza \
   --p-min-frequency X \
   --p-min-samples 1 \
   --o-filtered-table deblur_output/deblur_table_filt.qza

4.2 Filter out contaminant and unclassified ASVs

Now that we have assigned taxonomy to our ASVs we can use that information to remove ASVs which are likely contaminants or noise based on the taxonomic labels. Two common contaminants in 16S sequencing data are mitochondrial and chloroplast 16S sequences, which can be removed by excluding any ASV which contains those terms in its taxonomic label. It can also be sometimes useful to exclude any ASV that is unclassified at the phylum level since these sequences could be noise (e.g. possible chimeric sequences). Note that if your data has not been classified against the default database you may need to change p__ to be a string that enables phylum-level assignments to be identified or simply omit that line.

In general though, it can be very informative if your sequencing reads are coming back with significant amounts of unclassified ASVs as it can indicate upstream analysis problems or indicate you are studying a poorly characterized environment where you have a good chance of identifying a lot of novel phyla. Therefore, our recommendation is to not filter out the unclassified sequences by default:

qiime taxa filter-table \
   --i-table deblur_output/deblur_table_filt.qza \
   --i-taxonomy taxa/classification.qza \
   --p-exclude mitochondria,chloroplast \
   --o-filtered-table deblur_output/deblur_table_filt_contam.qza

If you do want to exclude all the unclassifieds, then run the following instead (with the above caveat that the --p-include line might need to be adapted according to the tax database used):

qiime taxa filter-table \
   --i-table deblur_output/deblur_table_filt.qza \
   --i-taxonomy taxa/classification.qza \
   --p-include p__ \
   --p-exclude mitochondria,chloroplast \
   --o-filtered-table deblur_output/deblur_table_filt_contam.qza

4.3 Exclude low-depth samples

Often certain samples will have quite low depth after these filtering steps, which can be excluded from downstream analyses since they will largely add noise. There is no single cut-off that works best for all datasets, but researchers often use minimum cut-offs within the range of 1000 to 4000 reads. You can also use a cut-off much lower than this if you want to retain all samples except those that failed entirely (e.g. depth < 50 reads). Ideally you would choose this cut-off after visualizing rarefaction curves to determine at what read depth the richness of your samples plateaus and choose a cut-off as close to this plateau as possible while retaining sufficient sample size for your analyses.

To perform this rarefaction curve analysis you would first need to summarize the filtered table we produced in the last step:

qiime feature-table summarize \
   --i-table deblur_output/deblur_table_filt_contam.qza \
   --o-visualization deblur_output/deblur_table_filt_contam_summary.qzv

From this table you need to determine the maximum depth across your samples. You can then generate the rarefaction curves with this command (where X is a placeholder for the max depth across samples).

qiime diversity alpha-rarefaction \
   --i-table deblur_output/deblur_table_filt_contam.qza \
   --p-max-depth X \
   --p-steps 20 \
   --p-metrics 'observed_features' \
   --o-visualization rarefaction_curves_test.qzv

Take a look at these curves to help decide on a minimum depth cut-off for retaining samples. Once you decide on a hard cut-off you can exclude samples below this cut-off with this command (where SET_CUTOFF is a placeholder for the minimum depth you select):

qiime feature-table filter-samples \
   --i-table deblur_output/deblur_table_filt_contam.qza \
   --p-min-frequency SET_CUTOFF \
   --o-filtered-table deblur_output/deblur_table_final.qza

Alternatively, if you do not wish to exclude any samples then you can simply make a copy of the QZA file with the final table filename (i.e. cp deblur_output/deblur_table_filt_contam.qza deblur_output/deblur_table_final.qza), since this is the filename used for the remaining commands below.

4.4 Subset and summarize filtered table

Once we have our final filtered table we will need to subset the QZA of ASV sequences to the same set. You can exclude the low frequency ASVs from the sequence file with this command:

qiime feature-table filter-seqs \
   --i-data deblur_output/representative_sequences.qza \
   --i-table deblur_output/deblur_table_final.qza \
   --o-filtered-data deblur_output/rep_seqs_final.qza

Finally, you can make a new summary of the final filtered abundance table:

qiime feature-table summarize \
   --i-table deblur_output/deblur_table_final.qza \
   --o-visualization deblur_output/deblur_table_final_summary.qzv

5A. Build tree with SEPP QIIME 2 plugin (16S data)

SEPP is one method for placing short sequences into a reference phylogenetic tree. This is a useful way of determining a phylogenetic tree for your ASVs. For 16S data you can do this with q2-fragment-insertion using the below command:

qiime fragment-insertion sepp \
   --i-representative-sequences deblur_output/rep_seqs_final.qza \
   --i-reference-database /home/shared/rRNA_db/16S/sepp-refs-gg-13-8.qza \
   --o-tree asvs-tree.qza \
   --o-placements insertion-placements.qza \
   --p-threads $NCORES

Note that if you do not already have this file locally you will need to download sepp-refs-gg-13-8.qza as specified in the fragment-insertion instructions. You can specify custom reference files to place other amplicons, but the easiest approach for 18S and ITS data is to instead create a de novo tree as outlined below.

5B. QIIME 2 de novo tree creation (18S and ITS data)

Given the lack of a pre-calculated reference tree for 18S and ITS data (unlike the above 16S tree) to do direct placement, we create the tree de novo here below.

Making multiple-sequence alignment

We'll need to make a multiple-sequence alignment of the ASVs before running FastTree. First, we'll make a folder for the output files.

mkdir tree_out

We'll use MAFFT to make a de novo multiple-sequence alignment of the ASVs.

qiime alignment mafft --i-sequences deblur_output/rep_seqs_final.qza \
                      --p-n-threads $NCORES \
                      --o-alignment tree_out/rep_seqs_final_aligned.qza

Filtering multiple-sequence alignment

Variable positions in the alignment need to be masked before FastTree is run, which can be done with this command:

qiime alignment mask --i-alignment tree_out/rep_seqs_final_aligned.qza \
                     --o-masked-alignment tree_out/rep_seqs_final_aligned_masked.qza

Running FastTree

Finally FastTree can be run on this masked multiple-sequence alignment:

qiime phylogeny fasttree --i-alignment tree_out/rep_seqs_final_aligned_masked.qza \
                         --p-n-threads $NCORES \
                         --o-tree tree_out/rep_seqs_final_aligned_masked_tree

Add root to tree

FastTree returns an unrooted tree. One basic way to add a root to a tree is to add it add it at the midpoint of the largest tip-to-tip distance in the tree, which is done with this command:

qiime phylogeny midpoint-root --i-tree tree_out/rep_seqs_final_aligned_masked_tree.qza \
                              --o-rooted-tree tree_out/rep_seqs_final_aligned_masked_tree_rooted.qza

Re-name file

To keep this output filename consistent with the SOP you can simply make a copy of this output tree.

cp tree_out/rep_seqs_final_aligned_masked_tree_rooted.qza asvs-tree.qza

6. Generate rarefaction curves

A key quality control step is to plot rarefaction curves for all of your samples to determine if you performed sufficient sequencing. The below command will generate these plots (X is a placeholder for the maximum depth in your dataset, which you can determine by running the summarize command above). Depending on if you decided to exclude any samples you may not want to re-create rarefaction curves. Note however that the below command will output rarefaction curves for a range of alpha-diversity metrics (including phylogenetic metrics), whereas above the curves were based on richness (referred to as observed_features) only. Also note that the $METADATA bash variable was defined at the first step of this SOP and simply points to the metadata table.

qiime diversity alpha-rarefaction \
   --i-table deblur_output/deblur_table_final.qza \
   --p-max-depth X \
   --p-steps 20 \
   --i-phylogeny asvs-tree.qza \
   --m-metadata-file $METADATA \
   --o-visualization rarefaction_curves.qzv

For some reason, the QIIME 2 default in the above curves with the metadata file (which you can see in the visualization) is to not give you the option of seeing each sample's rarefaction curve individually (even though this is the default later on in stacked barplots!), only the "grouped" curves by each metadata type. As it can be quite important in data QC to see if you have inconsistent samples, we need to rerun the above command, but this time omitting the metadata file (use the same X for the maximum depth, as above).

qiime diversity alpha-rarefaction \
   --i-table deblur_output/deblur_table_final.qza \
   --p-max-depth X \
   --p-steps 20 \
   --i-phylogeny asvs-tree.qza \
   --o-visualization rarefaction_curves_eachsample.qzv

7. Generate stacked barchart of taxa relative abundances

A more useful output is the interactive stacked bar-charts of the taxonomic abundances across samples, which can be output with this command:

qiime taxa barplot \
   --i-table deblur_output/deblur_table_final.qza \
   --i-taxonomy taxa/classification.qza \
   --m-metadata-file $METADATA \
   --o-visualization taxa/taxa_barplot.qzv

Now the QIIME 2 default in the above barplots is to plot each sample individually. However, in the visualizer you can label the samples according to their metadata types, but the plots are not summed together according to their metadata (as we could do in QIIME1 using the summarize_taxa_through_plots.py script), so we have to rerun the above command after running a new command first to group the samples by metadata - this creates a new feature table (you have to do this for each metadata category) which becomes the new input for the above same taxa barplot command (again, run each time for each metadata category). Note that CATEGORY here below is a placeholder for the text label of your category of interest from the metadata file. Also you will need to create a new metadata file (called CATEGORY_METADATA below) to make a barplot of this grouped data.

qiime feature-table group \
   --i-table deblur_output/deblur_table_final.qza \
   --p-axis sample \
   --p-mode sum \
   --m-metadata-file $METADATA \
   --m-metadata-column CATEGORY \
   --o-grouped-table deblur_output/deblur_table_final_CATEGORY.qza

qiime taxa barplot \
   --i-table deblur_output/deblur_table_final_CATEGORY.qza \
   --i-taxonomy taxa/classification.qza \
   --o-visualization taxa/taxa_barplot_CATEGORY.qzv

8. Calculating diversity metrics and generating ordination plots

Common alpha and beta-diversity metrics can be calculated with a single command in QIIME 2. In addition, ordination plots (such as PCoA plots for weighted UniFrac distances) will be generated automatically as well. This command will also rarefy all samples to the sample sequencing depth before calculating these metrics (X is a placeholder for the lowest reasonable sample depth; samples with depth below this cut-off will be excluded).

qiime diversity core-metrics-phylogenetic \
   --i-table deblur_output/deblur_table_final.qza \
   --i-phylogeny asvs-tree.qza \
   --p-sampling-depth X \
   --m-metadata-file $METADATA \
   --p-n-jobs-or-threads $NCORES \
   --output-dir diversity

You can then produce boxplots comparing the different categories in your metadata file. For example, to create boxplots comparing the Shannon alpha-diversity metric you can use this command:

qiime diversity alpha-group-significance \
   --i-alpha-diversity diversity/shannon_vector.qza \
   --m-metadata-file $METADATA \
   --o-visualization diversity/shannon_compare_groups.qzv

Note that you can also export (see below) this or any other diversity metric file (ending in .qza) and analyze them with a different program.

9. Identifying differentially abundant features with ANCOM

ANCOM is one method to test for differences in the relative abundance of features between sample groupings. It is a compositional approach that makes no assumptions about feature distributions. However, it requires that all features have non-zero abundances so a pseudocount first needs to be added (1 is a typical pseudocount choice):

qiime composition add-pseudocount \
   --i-table deblur_output/deblur_table_final.qza \
   --p-pseudocount 1 \
   --o-composition-table deblur_output/deblur_table_final_pseudocount.qza

Then ANCOM can be run with this command; note that CATEGORY is a placeholder for the text label of your category of interest from the metadata file:

qiime composition ancom \
   --i-table deblur_output/deblur_table_final_pseudocount.qza \
   --m-metadata-file $METADATA \
   --m-metadata-column CATEGORY \
   --output-dir ancom_output

10. Exporting the final abundance, profile and sequence files

Lastly, to get the BIOM file (with associated taxonomy) and FASTA file (one per ASV) for your dataset to plug into other programs you can use the commands below.

To export the FASTA:

qiime tools export \
   --input-path deblur_output/rep_seqs_final.qza \
   --output-path deblur_output_exported

To export the BIOM table (with taxonomy added as metadata):

sed -i -e '1 s/Feature/#Feature/' -e '1 s/Taxon/taxonomy/' taxa/taxonomy.tsv

qiime tools export \
   --input-path deblur_output/deblur_table_final.qza \
   --output-path deblur_output_exported

biom add-metadata \
   -i deblur_output_exported/feature-table.biom \
   -o deblur_output_exported/feature-table_w_tax.biom \
   --observation-metadata-fp taxa/taxonomy.tsv \
   --sc-separated taxonomy

biom convert \
   -i deblur_output_exported/feature-table_w_tax.biom \
   -o deblur_output_exported/feature-table_w_tax.txt \
   --to-tsv \
   --header-key taxonomy

To export the tree of your ASVs:

qiime tools export \
  --input-path asvs-tree.qza \
  --output-path .

Other resources

There are many other possible QIIME 2 analyses that we recommend you look into. You may also find these other resources useful:

Changelog

  • v2020.8 (2020/10/30): Changed qiime quality-filter q-score-joined to be qiime quality-filter q-score.
  • v2020.8 (2020/10/30): Updated taxa classifiers, but no longer output classifiers for individual 16S/18S variable regions.
  • v2022.2 (2022/11/09): Changed to state removing unclassified ASVs is not the recommended default when filtering out "contaminants".
  • v2022.11 (2023/01/04): Changed qiime vsearch join-pairs to qiime vsearch merge-pairs, and the option --o-joined-sequences to --o-merged-sequences.
Clone this wiki locally