-
Notifications
You must be signed in to change notification settings - Fork 204
Download_From_SRA
Obtain a list of sample ids (SRR) from the SRA and place them one per line in a text file.
This is done through NCBI (example: https://www.ncbi.nlm.nih.gov/sra?term=SRA045646) and selecting "Send to"-> File (choose "Accession List" for the format).
More information about this is here: https://www.ncbi.nlm.nih.gov/books/NBK158899/
Download each file using wget.
You can form the URL for each file like so (note that the first 3 digits of the identifier is used as a subdirectory): "wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR304/SRR304976/SRR304976.sra"
Since we don't want to do that manually for each file we can get parallel to help:
cat SraAccList.txt | parallel -j 1 wget -P sra ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/{='$_=substr($_,0,3)'=}/{='$_=substr($_,0,6)'=}/{}/{}.sra
- The
cat SraAccList.txt
pipes the contents of SraAccList.txt to the parallel command. This assumes that the SRR ids are all in the file SraAccList.txt that was downloaded in Step 1. - The
-j 1
specifies the number of threads to use. Using1
limits to downloading one file at a time (simultaneous downloads shouldn't be faster). - The
-P sra
specifies that the download files should be place in the directory 'sra'.
Convert from SRA to FASTQ:
parallel -j 1 fastq-dump --skip-technical -F --split-files -O fastq {} ::: sra/*
-
::: sra/*.sra
feeds the downloaded sra files from step 2 and pipes that list to parallel for processing -
-j 1
specifies number of threads to use. Can increase this number to allow parallel processing of files. -
-F
specifies that the original ids be used (instead of those changed by the SRA) -
--skip-technical
some sequencing technologies will have other reads besides forward and reverse. This skips those. -
--split-files
will split the files into forward and reverse reads -
-O fastq
specifies the directory to place the converted fastq files -
--gzip
can be added as an option if you would like the fastq files to be gzipped (this saves space, but takes much longer to do the conversion).
A nice explanation of other fastq-dump options are provided by Rob Edward's group: https://edwards.sdsu.edu/research/fastq-dump/
- Please feel free to post a question on the Microbiome Helper google group if you have any issues.
- General comments or inquires about Microbiome Helper can be sent to [email protected].