Skip to content

Bisulfite workflow

Chris Miller edited this page May 24, 2019 · 3 revisions

Purpose:

Alignment and QC of bisulfite-treated DNA, where unmethylated Cs are converted to Ts.

Location

Inputs

  • reference_index: a biscuit-indexed genome fasta file
  • reference_sizes: a tsv list of chromosome names and lengths (often stored next to the fastq as 'all_sequences.genome'
  • instrument_data_bams: array of unaligned bam files
  • read_group_id: read groups corresponding to the bams
  • sample_name:
  • trimming_adapters: Standard trimming parameters - see the Trimming page for a detailed description
  • trimming_adapter_trim_end:
  • trimming_adapter_min_overlap:
  • trimming_max_uncalled:
  • trimming_min_readlength:
  • QCannotation: Files needed for the biscuit bisulfite QC scripts. (add details, how to create these for new builds)

Outputs

  • cram (final.cram): alignments in CRAM format
  • vcf (pileup.vcf.gz): Pileup output that contains all CpG sites in the genome with readcounts, methylation ratios, context information, etc
  • cpgs (cpgs.bed.gz): a 6-column bed file derived from the pileup (chr, start, stop, methratio, depth, details)
  • cpg_bigwig (cpgs.bw): a bigwig file for visualizing the methylation ratio
  • qc_directory (...): details of contents

Example yaml file for b38

---
instrument_data_bams:
- class: File
  path: /gscmnt/gc2687/aadel/instrument_data/2900854195/gerald_HJ7VFDSXX_3_CGGCTATG-CAGGACGT.bam
- class: File
  path: /gscmnt/gc2687/aadel/instrument_data/2900917514/gerald_HFLG2DSXX_4_CGGCTATG-CAGGACGT.bam
read_group_id:
- "@RG\tID:2900854195\tPU:HJ7VFDSXX.3.CGGCTATG-CAGGACGT\tSM:TWCY-1313-1313-CD8-Tumor\tLB:TWCY-1313-1313-CD8-Tumor\tPL:Illumina\tCN:WUGSC"
- "@RG\tID:2900917514\tPU:HFLG2DSXX.4.CGGCTATG-CAGGACGT\tSM:TWCY-1313-1313-CD8-Tumor\tLB:TWCY-1313-1313-CD8-Tumor\tPL:Illumina\tCN:WUGSC"
reference_index: /gscmnt/gc2560/core/model_data/2887491634/build21f22873ebe0486c8e6f69c15435aa96/biscuit.0.2.2.20170522/all_sequences.fa
reference_sizes:
  class: File
  path: /gscmnt/gc2560/core/model_data/2887491634/build21f22873ebe0486c8e6f69c15435aa96/biscuit.0.2.2.20170522/all_sequences.genome
sample_name: TWCY-1313-1313-CD8-Tumor
trimming_adapter_min_overlap: 7
trimming_adapter_trim_end: RIGHT
trimming_adapters:
  class: File
  path: /gscmnt/gc2560/core/illumina_adapters/illumina_multiplex.fa
trimming_max_uncalled: 300
trimming_min_readlength: 25
QCannotation: /gscmnt/gc2560/core/model_data/2887491634/build21f22873ebe0486c8e6f69c15435aa96/biscuit.0.2.2.20170522/qc_assets/
Clone this wiki locally