Sequence QC

We use FastQC to check the quality of the sequenced reads. This is especially important after stitching the paired-end reads to help choose quality cut-offs.

To run FastQC on all of your FASTQs separately you just need to run:

mkdir fastqc_out
fastqc stitched_reads/*.assembled.fastq -o fastqc_out

Alternatively, if you want to look at the QC metrics for all of your FASTQs combined you can run:

mkdir fastqc_out_combined
cat stitched_reads/*.assembled.fastq | fastqc stdin -o fastqc_out_combined

See example output here.

Note that all of the reads should begin with same forward primer sequence, which explains why the "Per base sequence content" plot has peaks of 100% base content in the first few positions.

Also, note that a number of metrics ("Sequence Duplication Levels" , "Overrepresented sequences" and "Kmer Content") should not be used to evaluate data quality since of course we are looking at sequencing data for only a single genen so an excess of highly similar sequences are expected.

Contact

Please feel free to post a question on the Microbiome Helper google group if you have any issues.
General comments or inquires about Microbiome Helper can be sent to [email protected].

Useful Links

Main SOPs

Amplicon SOP v2 (qiime2-amplicon-2024.5)

PacBio Amplicon SOP v2 (qiime2-2022.2)

Metagenomics SOP v3

Wet-Lab SOPs on Protocols.io

Old SOPs

Tutorials

Microbiome for beginners

Metagenomics Resources

mSystems paper data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence QC

Contact

Clone this wiki locally