This is a version of the workflow provided by Nanopore epi2me labs at https://github.com/epi2me-labs/wf-single-cell designed specifically to work easily on Biowulf with long read input data. If you have any problems running it, consult either with Datatecnica or Biowulf. More specific information on the use and output of this workflow can be found at the epi2me-labs github.
Choose a good location on Biowulf to run this workflow. Make sure there are plenty of GB of space. You can check on the Biowulf Dashboard at hpc.nih.gov. You will need a copy of this repo for each file you wish to process. Each much be placed in a separate folder. It is also acceptable to copy the nextflow.config file and runscript.sh file separately in each new folder.
git clone [email protected]:NIH-CARD/wf_single_cell_longread.git
In each of your processing folders, create a directory to hold your fastq file or the symlink to it.
mkdir fastq
cd fastq
Inside this directory, either make a copy of the fastq file you would like to process or a symlink to it. Creating a symlink will save space as a full copy will not be created. However, you will also have to do the optional step 4. That method has not yet been successfully carried out and may be unsuccessful.
To create a copy:
cp /data/path/of/your/fastq/file.fastq.gz ./
Or, you can create a symlink:
ln -s /data/path/of/your/fastq/file.fastq.gz ./
Then cd back out:
cd ../
The answers to any questions related to this not answered here can be found at https://github.com/epi2me-labs/wf-single-cell.
- First, set the fastq directory (--fastq). Set it to /data/firectory/containing/.fastq.gz/file/
- In general, it is best to not use a sample_sheet unless you already have one written.
- Instead of a sample sheet, use the
--kit_name
,--kit_version
,--expected_cells
, etc which are already in the runscript in this repo. - Be sure to set the ref_genome_dir appropriately given the kit that was used. If the one indicated in this version of the sample_sheet is not appropriate, the appropriate one can likely be found in the cellranger files corresponding to the kit used for this experiment.
If you choose to use the symlink method, you must edit singularity.runOptions in the nextflow.config file to be "-B /data/folder/containing/everything/above/your/fastq/ to be as high above your fastq file(s) as you can. It still will likely not work the first time, so be sure to contact Biowulf for advice.
Cd back to the directory with the runscript.sh if you are not there.
To run the workflow:
sbatch runscript.sh
That's it! The output files should appear in the fastq directory between 3 and 24 hours after you initiate the workflow.