Download_From_SRA

Download From SRA

Step 1:

Obtain a list of sample ids (SRR) from the SRA and place them one per line in a text file.

This is done through NCBI (example: https://www.ncbi.nlm.nih.gov/sra?term=SRA045646) and selecting "Send to"-> File (choose "Accession List" for the format).

More information about this is here: https://www.ncbi.nlm.nih.gov/books/NBK158899/

Step 2:

Download each file using wget.

You can form the URL for each file like so (note that the first 3 digits of the identifier is used as a subdirectory): "wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR304/SRR304976/SRR304976.sra"

Since we don't want to do that manually for each file we can get parallel to help:

cat SraAccList.txt | parallel -j 1 wget -P sra ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/{='$_=substr($_,0,3)'=}/{='$_=substr($_,0,6)'=}/{}/{}.sra

The cat SraAccList.txt pipes the contents of SraAccList.txt to the parallel command. This assumes that the SRR ids are all in the file SraAccList.txt that was downloaded in Step 1.
The -j 1 specifies the number of threads to use. Using 1 limits to downloading one file at a time (simultaneous downloads shouldn't be faster).
The -P sra specifies that the download files should be place in the directory 'sra'.

Step 3:

Convert from SRA to FASTQ:

parallel -j 1 fastq-dump --skip-technical -F --split-files -O fastq {} ::: sra/*

::: sra/*.sra feeds the downloaded sra files from step 2 and pipes that list to parallel for processing
-j 1 specifies number of threads to use. Can increase this number to allow parallel processing of files.
-F specifies that the original ids be used (instead of those changed by the SRA)
--skip-technical some sequencing technologies will have other reads besides forward and reverse. This skips those.
--split-files will split the files into forward and reverse reads
-O fastq specifies the directory to place the converted fastq files
--gzip can be added as an option if you would like the fastq files to be gzipped (this saves space, but takes much longer to do the conversion).

A nice explanation of other fastq-dump options are provided by Rob Edward's group: https://edwards.sdsu.edu/research/fastq-dump/

Contact

Please feel free to post a question on the Microbiome Helper google group if you have any issues.
General comments or inquires about Microbiome Helper can be sent to [email protected].