Download_From_SRA

Download From SRA

Step 1:

Using an SRA id, obtain a list of all SRR ids associated with project in a single plain text file (with one SRR id per line). This is done through NCBI (example: https://www.ncbi.nlm.nih.gov/sra?term=SRA045646) and selecting "Send to"-> File (choose "Accession List" for the format). (More information about this is here: https://www.ncbi.nlm.nih.gov/books/NBK158899/)

Step 2:

Download each file using wget. You can form the URL for each file like so (note that the first 3 digits of the identifier is used as a subdirectory): "wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR304/SRR304976/SRR304976.sra"

Since we don't want to do that manually for each file we can get parallel to help:

cat SraAccList.txt | parallel -j 1 wget -P sra ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/{='$_=substr($_,0,3)'=}/{='$_=substr($_,0,6)'=}/{}/{}.sra

The cat SraAccList.txt pipes the contents of SraAccList.txt to the parallel command. This assumes that the SRR ids are all in the file SraAccList.txt that was downloaded in Step 1.
The -j 1 option limits to download one file at a time (simultaneous downloads shouldn't be faster).
The -P sra specifies that the download files should be place in the directory 'sra'.

Step 3:

Convert from SRA to FASTQ:

parallel -j 1 fastq-dump -F --split-files -O fastq {} ::: *.sra

Contact

Please feel free to post a question on the Microbiome Helper google group if you have any issues.
General comments or inquires about Microbiome Helper can be sent to [email protected].