Skip to content

nf-core/longraredisease

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

nf-core/longraredisease

GitHub Actions CI Status GitHub Actions Linting Status AWS CI Cite with Zenodo nf-test

Nextflow run with conda run with docker run with singularity Launch on Seqera Platform

Get help on Slack Follow on Twitter Follow on Mastodon Watch on YouTube


Introduction

longraredisease is a comprehensive Nextflow pipeline for Oxford Nanopore sequencing analysis, designed for rare disease research and diagnostics. It delivers high-confidence variant discovery by integrating multiple state-of-the-art tools. longraredisease performs multi-caller structural variant (SV) detection, single nucleotide variant (SNV) calling, copy number variant (CNV) analysis, short tandem repeat (STR) detection, and phasing analysis in a reproducible, modular workflow.

Pipeline Overview

  • Structural Variants (SVs): Sniffles, CuteSV, SVIM, with SURVIVOR merging
  • Single Nucleotide Variants (SNVs): Clair3, DeepVariant
  • Copy Number Variants (CNVs): Spectre, QDNAseq
  • Short Tandem Repeats (STRs): STRaglr
  • Phasing: LongPhase
  • Quality Control: Coverage analysis with mosdepth

Requirements

Software:

  • Nextflow (≥22.10.0)
  • Docker or Singularity/Apptainer

Hardware: Will be updated later in the project


Quick Start

1. Clone the Repository

git clone https://github.com/nourmahfel/nf-core-longraredisease.git
cd nf-core-longraredisease

2. Test Installation

nextflow run main.nf -profile test,docker

3. Run with Your Data

nextflow run main.nf     --bam_dir /path/to/bam/files      --outdir results     -profile docker

Input Data Requirements

Parameter Description Format Required
--bam_dir Directory containing BAM files Directory path
--fasta_file Reference genome FASTA .fasta/.fa
--outdir Output directory Directory path
--str_bed_file STR regions for analysis .bed
--bed_file Target regions BED file .bed Optional
--chrom_sizes Chromosome sizes file .txt Optional

Configuration Parameters

Core Analysis Options

--snv true/false              # SNV calling (default: true)
--cnv true/false              # CNV calling (default: true)
--str true/false              # STR analysis (default: true)
--phase true/false            # Phasing analysis (default: true)
--phase_with_sv true/false    # Include SVs in phasing (default: true)

SV Calling Parameters

--filter_sv_calls true/false           # Apply coverage-based filtering (default: true)
--min_read_support auto/integer        # Minimum read support (default: auto)
--min_read_support_limit integer       # Minimum support limit (default: 3)
--merge_sv_calls true/false            # Merge calls from multiple callers (default: true)
--max_distance_breakpoints integer     # Max distance for merging (default: 1000)
--min_supporting_callers integer       # Min callers supporting variant (default: 2)
--min_sv_size integer                  # Minimum SV size (default: 30)

SNV Calling Parameters

--clair3_model string                  # Model name (default: r1041_e82_400bps_sup_v500)
--clair3_platform ont/pacbio           # Sequencing platform (default: ont)
--use_deepvariant true/false           # Run DeepVariant alongside Clair3 (default: true)

CNV Calling Parameters

--use_qdnaseq true/false               # Use QDNAseq instead of Spectre (default: false)
--genome_build hg38/hg19               # Genome build (default: hg38)
--qdnaseq_bin_size integer             # Bin size in kb (default: 1000)
--cutoff float                         # CNV calling cutoff (default: 0.5)
--spectre_fasta_file path              # Full genome FASTA for Spectre
--spectre_mosdepth path                # Mosdepth regions file
--spectre_snv_vcf path                 # SNV VCF for Spectre

Usage Examples

Basic Run

nextflow run main.nf     --bam_dir /data/bam_files     --fasta_file /ref/genome.fasta     --outdir results     -profile docker

SV-Only Analysis

nextflow run main.nf     --bam_dir /data/bam_files     --fasta_file /ref/genome.fasta     --snv false     --cnv false     --str false     --phase false     --outdir sv_results     -profile docker

Targeted Analysis with BED File

nextflow run main.nf     --bam_dir /data/bam_files     --fasta_file /ref/genome.fasta     --bed_file /targets/exome.bed     --use_qdnaseq true     --outdir targeted_results     -profile docker

High-Sensitivity SV Calling

nextflow run main.nf     --bam_dir /data/bam_files     --fasta_file /ref/genome.fasta     --min_supporting_callers 1     --min_sv_size 20     --filter_sv_calls false     --outdir sensitive_sv     -profile docker

Custom Resource Limits

nextflow run main.nf     --bam_dir /data/bam_files     --fasta_file /ref/genome.fasta     --outdir results     -profile docker     --max_cpus 32     --max_memory 128.GB

Output Structure

results/
├── minimap2/           # Aligned BAM files
├── mosdepth/           # Coverage analysis
├── sniffles/           # Sniffles SV calls
├── cutesv/             # CuteSV SV calls
├── svim/               # SVIM SV calls
├── survivor/           # Merged SV calls
├── clair3/             # Clair3 SNV calls
├── deepvariant/        # DeepVariant SNV calls (if enabled)
├── snv_combined/       # Combined SNV calls
├── longphase/          # Phasing results (if enabled)
├── spectre/            # Spectre CNV calls (if enabled)
├── runqdnaseq/         # QDNAseq CNV calls (if enabled)
├── straglr/            # STR analysis (if enabled)
└── pipeline_info/      # Execution reports

Configuration Profiles

Available Profiles:

  • test: Minimal test dataset
  • docker: Use Docker containers
  • singularity: Use Singularity containers

Custom Configuration

// custom.config
params {
    max_cpus = 16
    max_memory = '64.GB'
    outdir = '/scratch/results'
}

process {
    withName: 'CLAIR3' {
        cpus = 8
        memory = '32.GB'
    }
}

Run with:

nextflow run main.nf -c custom.config -profile docker

Test Data

The pipeline includes test data for validation:

  • Location: assets/test_data/
  • Genome: Chromosome 22 subset
  • Samples: Simulated nanopore data
  • Runtime: ~10-15 minutes

Performance Optimization

For Large Datasets

  • Increase resource limits: --max_cpus 64 --max_memory 256.GB
  • Use faster storage: --outdir /fast_storage/results
  • Enable process caching: -resume

For Limited Resources

  • Reduce parallel processes: --max_cpus 4 --max_memory 16.GB
  • Disable resource-intensive analyses: --use_deepvariant false --cnv false

Troubleshooting

Common Issues

  • Out of Memory Errors: Increase memory limits, e.g. --max_memory 64.GB
  • File Not Found Errors: Check file paths and permissions (ls -la /path/to/input/files)
  • Container Issues: Try different container engine (-profile singularity)
  • SURVIVOR Filename Collisions: Ensure BAM files have unique prefixes Check that filter_sv_calls is properly configured

Getting Help

  • Check the .nextflow.log file for detailed error messages
  • Use -resume to restart from the last successful step
  • Enable debug mode:
nextflow run main.nf -profile test,docker --debug

Citation

Citation paper to be added soon as writing one.


Contributing

We welcome contributions! Please see our Contributing Guidelines for details.


License

This project is licensed under the MIT License – see the LICENSE file for details.


This pipeline integrates several tools for variant calling: Sniffles, CuteSV, SVIM, SURVIVOR, Clair3, DeepVariant, LongPhase, Spectre, STRaglr

About

Long read sequencing pipeline to identify variants in patients with neurodevelopmental disorders

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Languages