Skip to content

TheJacksonLaboratory/quilt-nf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

quilt-nf: A Nextflow pipeline for haplotype reconstruction using low-coverage whole-genome sequencing data

These workflows support the service offered by JAX Genome Technologies for haplotype reconstruction with low-pass WGS. Please contact Sam Widmayer or Dan Gatti for more information.

JAX users are required to have access to the Sumner cluster, and to have Nextflow installed in their home directory. Any setup for external users will require additional support, and those wishing to share these workflows are encouraged to contact the maintainers of this repository.

This pipeline is implemented using Nextflow, a scalable, reproducible, and increasingly common language used in the development and maintenance of bioinformatics workflows. The modular nature of the workflow is enabled by software containers, such as Docker and Singularity, with all the software requirements for executing each step. Specific combinations and versions of software are specified in each container making analyses perfectly reproducible over time as long as the source data is unchanged.

Overview

flowchart TD
    p0((Sample))
    p1((R/qtl2 Covariate File))
    p2((Reference Haplotypes))
    p3((quilt bin shuffle values file))
    p4((downsampled coverage values file))

    p5[FASTP]:::process
    p6[FASTQC]:::process
    p7[READ GROUPS]:::process
    p8[bwa-mem]:::process
    p9[PICARD SortSam]:::process
    p10[PICARD MarkDuplicates]:::process
    p11[PICARD CollectAlignmentSummaryMetrics]:::process
    p12[PICARD CollectWgsMetrics]:::process
    p13[MULTIQC]:::process
    p14[samtools depth + coverage]:::process

    note1{--align_only}
    note2{Run QUILT}

    p15[DOWNSAMPLE BAM]:::process
    p16[CREATE BAMLIST]:::process
    p17[RUN QUILT]:::process
    p18[QUILT TO R/QTL2 FILES]:::process
    p19[GENOPROBS]:::process

    o1((MULTIQC report)):::output
    o2((PICARD CollectWgsMetrics file)):::output
    o3((PICARD CollectAlignmentSummaryMetrics file)):::output
    o4((samtools depth coverage file)):::output

    o5((quilt VCF file)):::output
    o6((quilt variant filtering summary file)):::output
    o7((R/qtl2 sample genotypes)):::output
    o8((R/qtl2 founder genotypes)):::output
    o9((R/qtl2 genetic and physical maps)):::output
    o10((R/qtl2 Cross Object)):::output
    o11((R/qtl2 36-state/Genotype Probabilities)):::output
    o12((R/qtl2 8-state/Allele Probabilities)):::output


    p0 --> p5
    p1 --> p17
    p2 --> p17
    p3 --> p17
    p4 --> p15
    p5 --> p6
    p6 --> p7
    p6 --> p13
    p7 --> p8
    p8 --> p9
    p9 --> p10
    p10 --> p11
    p10 --> p12
    p11 --> p13
    p12 --> p13
    p10 --> p14
    
    p13 -..-> note1
    p14 -..-> note1
    subgraph align_only [  ]
        note1 --> o1
        note1 --> o2
        note1 --> o3
        note1 --> o4
    end

    p13 -..-> note2
    p14 -..-> note2
    note2 --> p15
    p15 --> p16
    p16 --> p17
    p17 --> o5
    p17 --> p18
    p18 --> o6
    p18 --> o7
    p18 --> o8
    p18 --> o9
    p18 --> p19
    p19 --> o10
    p19 --> o11
    p19 --> o12

classDef output fill:#99e4ff,stroke:#000000,stroke-width:5px,color:#000000
classDef process fill:#00A2DC,stroke:#000000,stroke-width:2px,color:#000000

Loading

Execution:

On the JAX HPC, from within the quilt-nf directory:

sbatch run_scripts/quilt_DO.sh [run name]

A prospective user can write their own run script using the following template:

#!/bin/bash
#SBATCH --mail-user={USER.EMAIL}
#SBATCH --job-name=QUILT-NF
#SBATCH --mail-type=END,FAIL
#SBATCH -p compute
#SBATCH -q batch
#SBATCH -t 72:00:00
#SBATCH --mem=1G
#SBATCH --ntasks=1

cd $SLURM_SUBMIT_DIR

# LOAD NEXTFLOW
module use --append /projects/omics_share/meta/modules
module load nextflow

# RUN PIPELINE
nextflow main.nf \
--workflow quilt \
-profile sumner2 \
--sample_folder '{PATH TO DIRECTORY CONTAINTING FASTQ FILES}' \
--gen_org mouse \
--pubdir '{PATH TO DESIRED RESULTS DIRECTORY' \
--extension 'fastq.gz' \ # this is the typical file extension, but see run_scripts/quilt_DO_ddRADseq.sh for alternative example
--pattern="*_R{1,2}*" \ # see above comment
--library_type "seqwell" \ # see above comment
--run_name $1 \
-w '{PATH TO DESIRED NEXTFLOW WORK DIRECTORY}' \ # on JAX, use /flashscratch/{USER} or /flashscratch/{OTHER}
--downsample_to_cov '{PATH TO .CSV WITH COVERAGE VALUES TO DOWNSAMPLE TO}' \
--bin_shuffling_file '{PATH TO .CSV WITH QUILT BIN SHUFFLE RADIUS VALUES}' \
--cross_type 'do' \
--ref_file_dir '{PATH TO DIRECTORY WITH REFERENCE HAPLOTYPE FILES}' \
--covar_file '{PATH TO R/QTL2 COVAR FILE}' \
--comment "This script will run haplotype inference on DO lcWGS data" \
-resume