ATACme-seq is a high-throughput sequencing technique which could simultaneously measure chromatin accessibility and DNA methylation. This project contains the ATACme-seq analysis pipeline based on various bioinformatics software like FASTQC, CutAdapt, BWA-meth, etc.
- Introduction
- Setting up your workspace
- Basic quality checking with FastQC
- Trimming adapter sequence from reads
- Aligning the trimmed reads to a reference genome
- Calling peaks on the aligned reads
- Methylation calling on the aligned reads
We have 12 experiment setups divided by the doses of Tn5 (1/5x, 1x, 5x), sonication (sonicated, not sonicated), and bisulfite conversion (converted, not converted) for GM12878 cell line. We concentrated on the converted conditions, which are 1/5x_sonicated_converted
, 1/5x_not_sonicated_converted
, 1x_sonicated_converted
, 1x_not_sonicated_converted
,5x_sonicated_converted
, 5x_not_sonicated_converted
. For each condition, we only have one duplicate. Take 5x_sonicated_converted dataset as an example.
Started from fastq files. The files look at this:
@D00442:207:C97BPANXX:7:1109:1242:2112 1:N:0:AGCGATAG+GAATGGTCCC
GGGTATTTTTAATTTTAGAGTGAAAGTTTATAATTATTGTGTTTGTTTATAAGCCCCTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACAGCGATAGATCTCGTATGCCGCCTTCTGCTTGAA
+
B?AB@GEFGGBGGGGGGGGGFCGG>FGGGGGEG<FGGGGGGFEGFG>GGGGGE>C0=EB@F:>0C<GBGEG0@=>>FGGFGGGGG0:FGGFGF>GG/[email protected]:
@D00442:207:C97BPANXX:7:1109:1194:2120 1:N:0:AGCGATAG+GAATGGTCCC
ATATATAGTAGGTGGTATGAATTTAGGTATTTATATTTATATAATGATATTTTGTGATTTAGATTTAAAAAAGGTTTAATATTTTTAGTATTATTGGAAAATTTTAAATTTTTTGAAATNTTTTTT
+
...
Create a work directory and set a few environment variables to reduce typing later
export LAB_DIR=/nfs/kitzman3/users/shuzwang/atacme_analysis
cd $LAB_DIR
export LAB_DATA="/nfs/kitzman1/in_house_seq_runs/2016/core_hiseq/2016-06-01_PE125-10-10_Run_1563_PARKER/parker/Sample_63387_12"
export REF_DIR="/nfs/kitzman2/lab_common/alignment_index/bwameth-0.7/human/hs37d5"
Load FastQC software and run FastQc to check basic quality.
module load fastqc/0.11.5
fastqc -o ${LAB_DIR}/fastqc -f fastq ${LAB_DATA}/*.fastq.gz
Merge the trimmed reads and untrimmed reads for each experiment condition and then mark and remove duplicates.
samtools merge GM12878__5x__SONICATED__CONVERTED.bam /nfs/kitzman2/jacob/proj/matac/round3core_160607/data/bybc/bwameth_hs37d5/s/GM12878__5x__SONICATED__CONVERTED_TRIMMED.bam /nfs/kitzman2/jacob/proj/matac/round3core_160607/data/bybc/bwameth_hs37d5/s/GM12878__5x__SONICATED__CONVERTED_UNTRIMMED.bam
java -jar /nfs/kitzman2/lab_software/linux_x86-64/picard-tools-1.141/picard.jar MarkDuplicates METRICS_FILE=GM12878__5x__SONICATED__CONVERTED_dup.txt INPUT=/nfs/kitzman3/users/shuzwang/atacme/merged_atacme/GM12878__5x__SONICATED__CONVERTED.bam OUTPUT=/nfs/kitzman3/users/shuzwang/final/final_atacme/GM12878__5x__SONICATED__CONVERTED_md.bam ASSUME_SORTED=true CREATE_INDEX=true
Using MethylDackel to call CpG sites.
MethylDackel extract -q 30 -p 30 -d 10 -@ 8 -o GM12878__5x__SONICATED__CONVERTED ~/reference_index/hs37d5.fa /nfs/kitzman3/users/shuzwang/final/md_atacme/GM12878__5x__SONICATED__CONVERTED_md.bam