-
Notifications
You must be signed in to change notification settings - Fork 204
Subsampling Metagenomic FASTQs
Rarefaction is a commonly used method in amplicon sequencing studies to ensure that all samples have the same read depth (by randomly subsampling reads). This method is controversial as it requires that many reads are discarded. Shotgun metagenomics reads are not usually rarified in practice although the same arguments can be made as for amplicon data for why this should be the case.
If you want to rarify your data you can use seqtk
to randomly subsample a set number of reads from a FASTQ.
If you have paired-end data it is important that you make sure your read-pairs are in the same order in each FASTQ and use the same random seed when subsampling so that the reads from the same pairs will be subsampled correctly.
If your paired-end read pairs are not in the same order you can use the below commands to sort the FASTQs (note this assumes that every read in the forward FASTQ has a corresponding pair in the reverse FASTQ, and vice versa). This command was taken from the Edwards lab website.
mkdir sorted_fastqs
parallel -j 20 'cat {} | paste - - - - | sort -k1,1 -t " " | tr "\t" "\n" > sorted_fastqs/{/.}.sorted.fastq' ::: /path/to/fastqs/*.fastq
Once your paired-end reads are in the same order you can use the sample
command to subsample the reads. The input parameters are the input FASTQ and the number of reads to subsample ("DEPTH" in the below commands). It is also important to set the random seed (especially for paired-end data) so that reads from the same pairs are subsampled in the forward and reverse FASTQs. This option is set as "SEED" in the below commands and can be any number as long as it is the same for the forward and reverse FASTQs.
mkdir rarified_fastqs
parallel -j 4 'seqtk sample -s SEED {} DEPTH > rarified_fastqs/{/.}_subsampled.fastq' ::: path/to/fastqs/*_R1_*_paired.sorted.fastq
parallel -j 4 'seqtk sample -s SEED {} DEPTH > rarified_fastqs/{/.}_subsampled.fastq' ::: path/to/fastqs/*_R2_*_paired.sorted.fastq
- Please feel free to post a question on the Microbiome Helper google group if you have any issues.
- General comments or inquires about Microbiome Helper can be sent to [email protected].