Skip to content

Download_From_SRA

Morgan Langille edited this page Jul 20, 2017 · 22 revisions

Download From SRA

Step 1:

Obtain a list of SRA ids and place them one per line in a text file. Using an SRA id, obtain a list of all SRR ids associated with project in a single plain text file (with one SRR id per line). This is done through NCBI (example: https://www.ncbi.nlm.nih.gov/sra?term=SRA045646) and selecting "Send to"-> File (choose "Accession List" for the format). (More information about this is here: https://www.ncbi.nlm.nih.gov/books/NBK158899/)

Step 2:

Download each file using wget. You can form the URL for each file like so (note that the first 3 digits of the identifier is used as a subdirectory): "wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR304/SRR304976/SRR304976.sra"

Since we don't want to do that manually for each file we can get parallel to help:

cat SraAccList.txt | parallel -j 1 wget -P sra ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/{='$_=substr($_,0,3)'=}/{='$_=substr($_,0,6)'=}/{}/{}.sra                                       
  • The cat SraAccList.txt pipes the contents of SraAccList.txt to the parallel command. This assumes that the SRR ids are all in the file SraAccList.txt that was downloaded in Step 1.
  • The -j 1 specifies the number of threads to use. Using 1 limits to downloading one file at a time (simultaneous downloads shouldn't be faster).
  • The -P sra specifies that the download files should be place in the directory 'sra'.

Step 3:

Convert from SRA to FASTQ:

parallel -j 1 fastq-dump -F --split-files -O fastq {} ::: sra/*
  • ::: sra/*.sra feeds the downloaded sra files from step 2 and pipes that list to parallel for processing
  • -j 1 specifies number of threads to use. Can increase this number to allow parallel processing of files.
  • -F specifies that the original ids be used (instead of those changed by the SRA)
  • --skip-technical some sequencing technologies will have other reads besides forward and reverse. This skips those.
  • --split-files will split the files into forward and reverse reads
  • -O fastq specifies the directory to place the converted fastq files

A nice explanation of other fastq-dump options are provided by Rob Edward's group: https://edwards.sdsu.edu/research/fastq-dump/

Clone this wiki locally