Peakachu Cohort Analysis Toolkit

This project extends the capabilities of the Peakachu loop caller (Salameh et al., Nat Commun 11, 3428 (2020)) to enable scalable analysis of chromatin loops across cohorts of samples.

Core Features

The primary goal is to automate and streamline the analysis of multiple Hi-C datasets (.hic/.cool files), identify differential looping patterns, and facilitate interactive visualization.

Key functionalities include:

Batch Loop Calling:
- Automates Peakachu's train and score_genome functions across multiple samples and resolutions (e.g., 5kb, 10kb).
- Parallelizes analysis for efficient processing of large cohorts.
Intensity Extraction:
- Retrieves raw and CLR-normalized contact counts for all predicted loops.
- Provides quantitative data essential for downstream differential analysis.
CTCF Overlap Annotation:
- Annotates loops by intersecting anchor regions with provided CTCF ChIP-seq peak files (BED format).
- Helps prioritize biologically relevant, CTCF-mediated loops.
Differential Comparison:
- Performs statistical comparisons (e.g., fold-change, Wilcoxon test) of loop intensities between defined groups (e.g., mutant vs. wild-type).
- Identifies loops with significant changes associated with experimental conditions.
HiGlass Integration:
- Generates configuration files to visualize predicted loops and intensity tracks within the HiGlass interactive genome browser.
- Packages outputs for easy loading and exploration.

Getting Started

Installation

This toolkit is designed as a Python package. You can install it directly from this repository using pip:

pip install git+https://github.com/your-username/peakachu-cohort-analysis.git
# Or, after cloning the repository:
# cd peakachu-cohort-analysis
# pip install .

We recommend using a dedicated virtual environment (e.g., conda or venv). Ensure you have Python 3.8 or higher.

Configuration

The main workflow is driven by a configuration file, typically named config.yaml. This file specifies input data locations, analysis parameters, and group definitions for comparisons.

Here's an example structure:

# config.yaml example
output_dir: ./results/cohort_analysis
resolutions: [5000, 10000] # Resolutions in bp (e.g., 5kb, 10kb)

# --- Input Data ---
hic_files: # List of .hic or .cool files
  - /path/to/sample1.hic
  - /path/to/sample2.mcool::/resolutions/5000 # Specify resolution for multi-res coolers
  - /path/to/sample3.hic
  # ... more samples

ctcf_peaks: # Optional: BED file with CTCF peaks for annotation
  - /path/to/ctcf_peaks.bed

# --- Peakachu Parameters ---
peakachu_model: /path/to/pretrained/peakachu_model.pkl # Optional: Use a pre-trained model
peakachu_params: # Parameters passed to Peakachu score_genome
  min_dist: 10000
  max_dist: 3000000
  # ... other peakachu parameters

# --- Cohort & Group Definitions ---
samples: # Define metadata and group assignment for each sample
  sample1:
    group: 'wildtype'
    # Add other metadata if needed
  sample2:
    group: 'mutant'
  sample3:
    group: 'wildtype'
  # ... map sample names (from hic_files base names) to groups

groups: # Define the groups for comparison
  - wildtype
  - mutant

# --- Differential Analysis ---
differential_params:
  method: 'wilcoxon' # 'foldchange' or 'wilcoxon'
  pseudocount: 1 # For fold-change calculation
  fdr_threshold: 0.05 # Significance threshold

# --- HiGlass Configuration ---
higlass_options:
  server: 'http://localhost:8888/api/v1' # Your HiGlass server API endpoint
  track_color_range: ['#FFFFFF', '#FF0000'] # Color range for intensity tracks

Adjust the paths and parameters according to your specific dataset and analysis goals.

Basic Usage

The primary way to run the analysis is via the main script (e.g., run_cohort_analysis.py), providing the configuration file:

python run_cohort_analysis.py --config config.yaml

This command will execute the following steps based on the configuration:

Run Peakachu score_genome for each sample and resolution.
Extract loop intensities (raw and normalized).
Annotate loops with CTCF overlap (if provided).
Perform differential analysis between specified groups.
Generate HiGlass configuration files for visualization.

Results will be saved in the directory specified by output_dir in the config.yaml file.

Development Roadmap

See scripts/prd.txt for details on the development plan, including MVP requirements and future enhancements.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.cursor		.cursor
examples		examples
higlass_project/modules		higlass_project/modules
peakachu_cohort		peakachu_cohort
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AML296361_49990_MicroC_mapq1_loops.bedpe		AML296361_49990_MicroC_mapq1_loops.bedpe
README.md		README.md
analyze_loop_matrix.py		analyze_loop_matrix.py
calculate_sample_depths.sh		calculate_sample_depths.sh
check_peakachu_depths.sh		check_peakachu_depths.sh
generate_unified_region_loop_matrix.py		generate_unified_region_loop_matrix.py
package-lock.json		package-lock.json
package.json		package.json
run_peakachu_jobs.sh		run_peakachu_jobs.sh
run_peakachu_standard_jobs.sh		run_peakachu_standard_jobs.sh
submit_peakachu_batch.slurm		submit_peakachu_batch.slurm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Peakachu Cohort Analysis Toolkit

Core Features

Getting Started

Installation

Configuration

Basic Usage

Development Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Languages

dhslab/peakachu-cohort

Folders and files

Latest commit

History

Repository files navigation

Peakachu Cohort Analysis Toolkit

Core Features

Getting Started

Installation

Configuration

Basic Usage

Development Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages