-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rc peakcall 2797 #1478
Open
rsc3
wants to merge
81
commits into
develop
Choose a base branch
from
rc_peakcall_2797
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Rc peakcall 2797 #1478
Changes from all commits
Commits
Show all changes
81 commits
Select commit
Hold shift + click to select a range
ce90d53
add peak calling
aawdeh 82c5c45
added output for peaks
aawdeh 6e833d2
fix
aawdeh 76286b4
fix
aawdeh 90805ca
test
aawdeh b27011c
remove ls
aawdeh c1f0e57
test
aawdeh 90b193c
Merge branch 'develop' into aa-peakcall
aawdeh 779fe9b
test
aawdeh 6d52ad2
Merge branch 'aa-peakcall' of https://github.com/broadinstitute/warp …
aawdeh 5743eff
test
aawdeh d3749c8
test
aawdeh 1de1c85
test
aawdeh 2d18756
test
aawdeh 13a44f9
test
aawdeh 3f12b9e
made another task
aawdeh 08b2f24
test
aawdeh 80c1416
test
aawdeh b5191e5
test
aawdeh 4725080
test
aawdeh f139f84
test
aawdeh 9a0ea9b
test
aawdeh 04cf0d2
test
aawdeh f9bc871
change a line
aawdeh 93d3360
test
aawdeh 256c4e5
test
aawdeh cd849b5
test
aawdeh 7c62fd1
test
aawdeh b15b146
test
aawdeh ff36c15
test
aawdeh 27a4196
test
aawdeh 987b5d2
Add peakcall to task
aawdeh cce5841
test
aawdeh 39ef1a5
Merge branch 'develop' into aa-peakcall
aawdeh bf0e9ce
add parameters
aawdeh d9240d6
Merge branch 'aa-peakcall' of https://github.com/broadinstitute/warp …
aawdeh 1ec5ff1
Merge branch 'develop' into aa-peakcall
aawdeh 90172ee
Merge branch 'develop' into aa-peakcall
aawdeh 18f77d8
test
aawdeh f7bdfa2
Merge branch 'aa-peakcall' of https://github.com/broadinstitute/warp …
aawdeh db59231
peak calling
rsc3 7b5a87e
looking for conflicts in atac.wdl
rsc3 0d4f010
remove bam_base_name from CreateFragments task
rsc3 987e1e5
separate task for macs3 consistent with create fragments
rsc3 0bc38f6
Update atac.wdl
rsc3 b136ba3
snapatac2 updates
rsc3 573a29f
turn off createfragments peakcalling
rsc3 7b8d1fc
remove peakcalling from createfragments and parameterize probability_…
rsc3 0689868
remove probability_threshold parameterization
rsc3 32f4b7f
remove repeated line
rsc3 fffb256
Updated pipeline_versions.txt with all pipeline version information
actions-user cf2a77b
add optional parameter
aawdeh 41de09c
Merge branch 'develop' into rc_peakcall_2797
aawdeh cf63ad1
add output
aawdeh 093faea
Merge branch 'rc_peakcall_2797' of https://github.com/broadinstitute/…
aawdeh 5e81613
add output
aawdeh f0216ab
test
aawdeh ef09ec9
test
aawdeh d64ad4d
test
aawdeh e3fe25f
test
aawdeh e6dad58
test
aawdeh dd9c77b
Merge branch 'develop' into rc_peakcall_2797
aawdeh 80e3fa1
Added matrix as output
aawdeh 75e8f2f
Merge branch 'rc_peakcall_2797' of https://github.com/broadinstitute/…
aawdeh b3ebae9
Updated pipeline_versions.txt with all pipeline version information
actions-user 9e4916a
Added matrix as output
aawdeh 13a448a
Merge branch 'rc_peakcall_2797' of https://github.com/broadinstitute/…
aawdeh cd7c100
Added matrix as output
aawdeh d7486d9
test
aawdeh 89824b0
test
aawdeh aee56ec
remove comments
aawdeh f90905b
Merge branch 'develop' into rc_peakcall_2797
aawdeh 83fbe05
update changelogs/versions
aawdeh af76a98
Merge branch 'rc_peakcall_2797' of https://github.com/broadinstitute/…
aawdeh a0dd316
Updated pipeline_versions.txt with all pipeline version information
actions-user 5680637
documenation
aawdeh f53e592
Merge branch 'rc_peakcall_2797' of https://github.com/broadinstitute/…
aawdeh 745d9c4
rename
aawdeh 54694bc
rename
aawdeh c664618
rename
aawdeh 328ec9a
rename
aawdeh File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,6 +28,8 @@ workflow ATAC { | |
|
||
# Option for running files with preindex | ||
Boolean preindex = false | ||
# Option for running peak calling | ||
Boolean peak_calling = true | ||
|
||
# BWA ref | ||
File tar_bwa_reference | ||
|
@@ -49,7 +51,7 @@ workflow ATAC { | |
String adapter_seq_read3 = "TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG" | ||
} | ||
|
||
String pipeline_version = "2.6.0" | ||
String pipeline_version = "2.6.1" | ||
|
||
# Determine docker prefix based on cloud provider | ||
String gcr_docker_prefix = "us.gcr.io/broad-gotc-prod/" | ||
|
@@ -61,7 +63,7 @@ workflow ATAC { | |
String cutadapt_docker = "cutadapt:1.0.0-4.4-1686752919" | ||
String samtools_docker = "samtools-dist-bwa:3.0.0" | ||
String upstools_docker = "upstools:1.0.0-2023.03.03-1704300311" | ||
String snap_atac_docker = "snapatac2:1.1.0" | ||
String snap_atac_docker = "snapatac2:2.0.0" | ||
|
||
# Make sure either 'gcp' or 'azure' is supplied as cloud_provider input. If not, raise an error | ||
if ((cloud_provider != "gcp") && (cloud_provider != "azure")) { | ||
|
@@ -71,7 +73,6 @@ workflow ATAC { | |
} | ||
} | ||
|
||
|
||
parameter_meta { | ||
read1_fastq_gzipped: "read 1 FASTQ file as input for the pipeline, contains read 1 of paired reads" | ||
read2_fastq_gzipped: "read 2 FASTQ file as input for the pipeline, contains the cellular barcodes corresponding to the reads in the read1 FASTQ and read 3 FASTQ" | ||
|
@@ -160,20 +161,30 @@ workflow ATAC { | |
input_id = input_id | ||
|
||
} | ||
if (peak_calling) { | ||
call PeakCalling { | ||
input: | ||
annotations_gtf = annotations_gtf, | ||
metrics_h5ad = CreateFragmentFile.Snap_metrics, | ||
chrom_sizes = chrom_sizes, | ||
docker_path = docker_prefix + snap_atac_docker | ||
} | ||
} | ||
} | ||
|
||
File bam_aligned_output_atac = select_first([BBTag.bb_bam, BWAPairedEndAlignment.bam_aligned_output]) | ||
File fragment_file_atac = select_first([BB_fragment.fragment_file, CreateFragmentFile.fragment_file]) | ||
File fragment_file_index_atac = select_first([BB_fragment.fragment_file_index, CreateFragmentFile.fragment_file_index]) | ||
File snap_metrics_atac = select_first([BB_fragment.Snap_metrics,CreateFragmentFile.Snap_metrics]) | ||
File library_metrics = select_first([BB_fragment.atac_library_metrics, CreateFragmentFile.atac_library_metrics]) | ||
|
||
output { | ||
File bam_aligned_output = bam_aligned_output_atac | ||
File fragment_file = fragment_file_atac | ||
File fragment_file_index = fragment_file_index_atac | ||
File snap_metrics = snap_metrics_atac | ||
File library_metrics_file = library_metrics | ||
File? peakcall_h5ad_file = PeakCalling.peaks_h5ad | ||
} | ||
} | ||
|
||
|
@@ -269,7 +280,6 @@ task GetNumSplits { | |
} | ||
} | ||
|
||
|
||
# trim read 1 and read 2 adapter sequeunce with cutadapt | ||
task TrimAdapters { | ||
input { | ||
|
@@ -513,7 +523,6 @@ task CreateFragmentFile { | |
File bam | ||
File annotations_gtf | ||
File chrom_sizes | ||
File annotations_gtf | ||
Boolean preindex | ||
Int disk_size = 500 | ||
Int mem_size = 64 | ||
|
@@ -536,7 +545,8 @@ task CreateFragmentFile { | |
} | ||
|
||
command <<< | ||
set -e pipefail | ||
set -euo pipefail | ||
set -x | ||
|
||
python3 <<CODE | ||
|
||
|
@@ -559,6 +569,9 @@ task CreateFragmentFile { | |
# use snap atac2 | ||
import snapatac2.preprocessing as pp | ||
import snapatac2 as snap | ||
import scanpy as sc | ||
import numpy as np | ||
import polars as pl | ||
import anndata as ad | ||
from collections import OrderedDict | ||
import csv | ||
|
@@ -579,8 +592,7 @@ task CreateFragmentFile { | |
atac_percent_target = number_of_cells / expected_cells*100 | ||
print("Setting percent target in nested dictionary") | ||
data['Cells']['atac_percent_target'] = atac_percent_target | ||
|
||
|
||
|
||
# Flatten the dictionary | ||
flattened_data = [] | ||
for category, metrics in data.items(): | ||
|
@@ -637,8 +649,164 @@ task CreateFragmentFile { | |
output { | ||
File fragment_file = "~{input_id}.fragments.sorted.tsv.gz" | ||
File fragment_file_index = "~{input_id}.fragments.sorted.tsv.gz.csi" | ||
|
||
File Snap_metrics = "~{input_id}.metrics.h5ad" | ||
File atac_library_metrics = "~{input_id}_~{atac_nhash_id}_library_metrics.csv" | ||
} | ||
} | ||
|
||
# peak calling using SnapATAC2 | ||
task PeakCalling { | ||
input { | ||
File annotations_gtf | ||
File metrics_h5ad | ||
File chrom_sizes | ||
String output_base_name | ||
|
||
# SnapATAC2 parameters | ||
Int min_counts = 5000 | ||
Int min_tsse = 10 | ||
Int max_counts = 100000 | ||
# Runtime attributes/docker | ||
String docker_path | ||
Int disk_size = 500 | ||
Int mem_size = 64 | ||
Int nthreads = 4 | ||
} | ||
|
||
parameter_meta { | ||
bam: "Aligned bam with CB in CB tag. This is the output of the BWAPairedEndAlignment task." | ||
annotations_gtf: "GTF for SnapATAC2 to calculate TSS sites of fragment file." | ||
disk_size: "Disk size used in create fragment file step." | ||
mem_size: "The size of memory used in create fragment file." | ||
docker_path: "The docker image path containing the runtime environment for this task" | ||
} | ||
|
||
command <<< | ||
set -euo pipefail | ||
set -x | ||
|
||
python3 <<CODE | ||
|
||
# use snap atac2 | ||
import snapatac2 as snap | ||
import scanpy as sc | ||
import numpy as np | ||
import polars as pl | ||
import pandas as pd | ||
|
||
base_name = "~{output_base_name}" | ||
atac_gtf = "~{annotations_gtf}" | ||
metrics_h5ad = "~{metrics_h5ad}" | ||
chrom_sizes = "~{chrom_sizes}" | ||
min_counts = "~{min_counts}" | ||
min_tsse = "~{min_tsse}" | ||
max_counts = "~{max_counts}" | ||
|
||
print("Peak calling starting...") | ||
atac_data = snap.read(metrics_h5ad) | ||
|
||
# Calculate and plot the size distribution of fragments | ||
print("Calculating fragment size distribution") | ||
snap.pl.frag_size_distr(atac_data) | ||
print(atac_data) | ||
|
||
# Filter cells | ||
print("Filtering cells") | ||
snap.pp.filter_cells(atac_data, min_counts=min_counts, min_tsse=min_tsse, max_counts=max_counts) | ||
print(atac_data) | ||
|
||
# Create a cell by bin matrix containing insertion counts across genome-wide 500-bp bins. | ||
print("Creating cell by bin matrix") | ||
atac_data_mod = snap.pp.add_tile_matrix(atac_data, inplace=False) | ||
print("set obsm") | ||
atac_data_mod.obsm["fragment_paired"] = atac_data.obsm["fragment_paired"] | ||
print("set all uns") | ||
for key in atac_data.uns.keys(): | ||
print("set ",key) | ||
atac_data_mod.uns[key] = atac_data.uns[key] | ||
print(atac_data_mod) | ||
|
||
# Feature selection | ||
print("Feature selection") | ||
snap.pp.select_features(atac_data_mod) | ||
print(atac_data_mod) | ||
|
||
# Run customized scrublet algorithm to identify potential doublets | ||
print("Run scrublet to identify potential doublets") | ||
snap.pp.scrublet(atac_data_mod) | ||
print(atac_data_mod) | ||
|
||
# Employ spectral embedding for dimensionality reduction | ||
print("Employ spectral embedding for dimensionality reduction") | ||
snap.tl.spectral(atac_data_mod) | ||
print(atac_data_mod) | ||
|
||
# Filter doublets based on scrublet scores | ||
print("Filter doublets based on scrublet scores") | ||
snap.pp.filter_doublets(atac_data_mod, probability_threshold=0.5) | ||
print(atac_data_mod) | ||
|
||
# Perform graph-based clustering to identify cell clusters. | ||
# Build a k-nearest neighbour graph using snap.pp.knn | ||
print("Perform knn graph-based clustering to identify cell clusters") | ||
snap.pp.knn(atac_data_mod) | ||
print(atac_data_mod) | ||
|
||
# Use the Leiden community detection algorithm to identify densely-connected subgraphs/clusters in the graph | ||
print("Use the Leiden community detection algorithm to identify densely-connected subgraphs/clusters in the graph") | ||
snap.tl.leiden(atac_data_mod) | ||
print(atac_data_mod) | ||
|
||
# Create the cell by gene activity matrix | ||
print("Create the cell by gene activity matrix") | ||
gene_mat = snap.pp.make_gene_matrix(atac_data_mod, gene_anno=atac_gtf) | ||
print(atac_data_mod) | ||
|
||
# Normalize the gene matrix | ||
print("Normalize the gene matrix") | ||
gene_mat.obs['leiden'] = atac_data_mod.obs['leiden'] | ||
sc.pp.normalize_total(gene_mat) | ||
sc.pp.log1p(gene_mat) | ||
sc.tl.rank_genes_groups(gene_mat, groupby="leiden", method="wilcoxon") | ||
|
||
for i in np.unique(gene_mat.obs['leiden']): | ||
markers = sc.get.rank_genes_groups_df(gene_mat, group=i).head(7)['names'] | ||
print(f"Cluster {i}: {', '.join(markers)}") | ||
|
||
print("Peak calling using MACS3") | ||
snap.tl.macs3(atac_data_mod, groupby='leiden', n_jobs=1) | ||
|
||
print("Merge peaks and create peak matrix") | ||
# read chrom sizes | ||
chromsize_dict = pd.read_csv(chrom_sizes, sep='\t', header=None) | ||
chromsize_dict = pd.Series(chromsize_dict[1].values, index=chromsize_dict[0]).to_dict() | ||
# merge peaks and create peak matrix | ||
peaks = snap.tl.merge_peaks(atac_data_mod.uns['macs3'], chromsize_dict) | ||
peak_matrix = snap.pp.make_peak_matrix(atac_data_mod, use_rep=peaks['Peaks']) | ||
|
||
print("Convert pl.DataFrame to pandas DataFrame") | ||
# Convert pl.DataFrame to pandas DataFrame | ||
for key in atac_data_mod.uns.keys(): | ||
if isinstance(atac_data_mod.uns[key], pl.DataFrame): | ||
print(key) | ||
atac_data_mod.uns[key] = atac_data_mod.uns[key].to_pandas() | ||
|
||
print("Write into h5ad file") | ||
atac_data_mod.write_h5ad("~{base_name}.cellbybin.h5ad") | ||
peak_matrix.write_h5ad("~{base_name}.cellbypeak.h5ad") | ||
|
||
CODE | ||
>>> | ||
|
||
runtime { | ||
docker: docker_path | ||
disks: "local-disk ${disk_size} SSD" | ||
memory: "${mem_size} GiB" | ||
cpu: nthreads | ||
} | ||
|
||
output { | ||
File peaks_h5ad = "~{output_base_name}.cellbybin.h5ad" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we adjust output variable name as well? |
||
File matrix_h5ad = "~{output_base_name}.cellbypeak.h5ad" | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add both peak calling output files? The cellbybin matrix and the cellbypeak matrix?