Demo for the Wells Lab prepared by McKenna Farmer, July 2023
This repo contains data and scripts for introducing 16S rRNA amplicon sequence analysis to the Wells Lab. Throughout this demo, I refer to the data processing steps interchangeably as a "workflow" or "pipeline". This is meant to convey that the analysis is performed in multiple steps, with inputs and outputs feeding into subsequent steps.
This demo will cover data processing and simple analysis examples of a wastewater microbiome. The concepts and workflow used here can apply to other study systems.
- Pre work
- QIIME overview and getting set up
- Import data
- Trim reads
- Make amplicon sequence variants
- Classify taxonomy
- Make phylogenetic tree
- Preparing for data analysis
- Example analysis
- Submitting to SRA
Prior to starting this tutorial, you should set up your computer with some key programs to access computing resources and files. I recommend the following:
- A command line terminal to interact with Quest (SSH client)
- Windows: install PuTTY
- Mac: use Terminal, which comes preinstalled
- A code editor
- Install Visual Studio Code (VSCode) for Windows and Mac
- A file explorer (FTP client)
We will be using the command line version of QIIME to process our data and will be using bash submission scripts to run each step. You can run some steps in an interactive job because they run relatively quickly, but for the purposes of this demo, we will only be using bash scripts so you have copies of your code.
In this tutorial, I provided you with curated sequencing data from a past sequencing project. Feel free to look for datasets you find in literature or through collaborations to practice the pipeline. If you do this, be sure to have the raw reads in the fastq.gz format and find the associated metadata so you can perform the data analysis portion.