-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnf_peakcalling
executable file
·231 lines (164 loc) · 9.63 KB
/
nf_peakcalling
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
/* ========================================================================================
OUTPUT DIRECTORY
======================================================================================== */
params.outdir = false
if(params.outdir){
outdir = params.outdir
} else {
outdir = '.'
}
/* ========================================================================================
PARAMETERS
======================================================================================== */
params.genome = ''
params.verbose = false
params.help = false
params.list_genomes = false
params.macs_callpeak_args = ''
params.seacr_args = ''
params.peak_caller = 'macs' // default is macs3
/* ========================================================================================
MESSAGES
======================================================================================== */
// Show help message and exit
if (params.help){
helpMessage()
exit 0
}
if (params.list_genomes){
println ("[WORKLFOW] List genomes selected")
}
// Validate peak callers
assert params.peak_caller == 'macs' || params.peak_caller == 'seacr' : "Invalid peak caller option: >>${params.peak_caller}<<. Valid options are: 'macs' or 'seacr'\n\n"
println ("Using peak caller: " + params.peak_caller)
if (params.verbose){
println ("[WORKFLOW] MACS CALLPEAK ARGS: " + params.macs_callpeak_args)
}
/* ========================================================================================
GENOMES
======================================================================================== */
include { getGenome; listGenomes } from './nf_modules/genomes.mod.nf'
if (params.list_genomes){
listGenomes() // this lists all available genomes, and exits
}
genome = getGenome(params.genome)
/* ========================================================================================
FILES CHANNEL
======================================================================================== */
include { makeFilesChannel; getFileBaseNames } from './nf_modules/files.mod.nf'
// Loading the design csv file
if(args[0].endsWith('.csv')){
Channel.fromPath(args)
.splitCsv(header: true, sep: ',')
.map { row -> [ file(row.treatment, checkIfExists: true), file(row.control, checkIfExists: true) ] }
.set { files_ch }
} else {
Channel.fromPath(args)
.set { files_ch }
}
/* ========================================================================================
WORKFLOW
======================================================================================== */
if (params.peak_caller == 'macs'){
include { MACS_CALLPEAK } from './nf_modules/macs.mod.nf' params(genome: genome)
if (params.verbose){ println ("[WORKFLOW] MACS CALLPEAK ARGS: " + params.macs_callpeak_args) }
}
else if (params.peak_caller == 'seacr'){
include { SEACR } from './nf_modules/seacr.mod.nf'
if (params.verbose){ println ("[WORKFLOW] SEACR ARGS: " + params.seacr_args) }
}
workflow {
main:
if (params.peak_caller == 'macs'){
MACS_CALLPEAK (files_ch, outdir, params.macs_callpeak_args, params.verbose)
}
else if (params.peak_caller == 'seacr'){
SEACR (files_ch, outdir, params.seacr_args, params.verbose)
}
}
workflow.onComplete {
def msg = """\
Pipeline execution summary
---------------------------
Jobname : ${workflow.runName}
Completed at: ${workflow.complete}
Duration : ${workflow.duration}
Success : ${workflow.success}
workDir : ${workflow.workDir}
exit status : ${workflow.exitStatus}
"""
.stripIndent()
sendMail(to: "${workflow.userName}@ethz.ch", subject: 'Minimal pipeline execution report', body: msg)
}
/* ========================================================================================
HELP MESSAGE
======================================================================================== */
def helpMessage() {
log.info"""
>>
SYNOPSIS:
This workflow takes in a list of files (in BedGraph format for Seacr and the various file formats suitable for MACS2)
or a CSV (comma-separated values) file. The CSV file needs to have the following structure:
==============================================================================================================
treatment,control
SEQ01_H3K27me3_rep1.bam,SEQ01_input_rep1.bam
SEQ01_H3K27me3_rep2.bam,SEQ01_input_rep2.bam
SEQ01_H3K27me3_rep3.bam,SEQ01_input_rep3.bam
==============================================================================================================
The first line must ALWAYS be "treatment,control". The following lines with the files names must have the
full directory if those are not in the working directory. The file names must match with the files available
in the working directory or in the folder specified by full directory.
If the input file is a CSV document, the pipeline with take the first column as the treatment samples and
the second column as the control sample. If the second column is empty, the peak caller will run without
a control sample.
If the input file is a aligned files (e.g., *.bam or *.bedgraph), the peak caller will run without a control sample.
==============================================================================================================
USAGE:
nf_peakcalling [options] --genome <genomeID> <input_files>
Mandatory arguments:
====================
<input_files> [Peak calling without control sample]
List of input files in BAM/BED/BedGraph format or other format
required by the peak caller, e.g. '*.bam' or '*.bedgraph'.
or
[Peak calling with control sample]
A CSV file with the names of the treatment and control files in
BAM/BED/BedGraph format or other format required by the peak caller, e.g. 'design.csv'.
(*) --genome [str] Genome build ID to be used for the alignment, e.g. GRCh38 (latest human genome) or GRCm38
(latest mouse genome build). To list all available genomes, see '--list_genomes' below.
(*) required only if using MACS as the peak caller.
Tool-specific options:
======================
--macs_callpeak_args="[str]" This option can take any number of options that are compatible with 'macs callpeak' to modify its default
behaviour. For more detailed information on available options please refer to the Samtools documentation,
or run 'macs callpeak' on the command line. Please note that the format ="your options" needs to be
strictly adhered to in order to work correctly. [Default: None]
--seacr_args="[str]" This option can take any number of options that are compatible with 'seacr' to modify its default
behaviour. For more detailed information on available options please refer to the Seacr documentation,
or run 'seacr' on the command line. Please note that the format ="your options" needs to be
strictly adhered to in order to work correctly.
[Default without IgG control: "0.01 norm stringent", Default with IgG control: "norm stringent"]
Other options:
==============
--outdir [str] Path to the output directory. [Default: current working directory]
--verbose More verbose status messages. [Default: OFF]
--help Displays this help message and exits.
--peak_caller "[str]" Choose the peak caller. [Default: "macs"] [Available options: "macs" or "seacr"]
Workflow options:
=================
Please note the single '-' hyphen for the following options!
-resume If a pipeline workflow has been interrupted or stopped (e.g. by accidentally closing a laptop),
this option will attempt to resume the workflow at the point it got interrupted by using
Nextflow's caching mechanism. This may save a lot of time.
-bg Sends the entire workflow into the background, thus disconnecting it from the terminal session.
This option launches a daemon process (which will keep running on the headnode) that watches over
your workflow, and submits new jobs to the SLURM queue as required. Use this option for big pipeline
jobs, or whenever you do not want to watch the status progress yourself. Upon completion, the
pipeline will send you an email with the job details. This option is HIGHLY RECOMMENDED!
-process.executor=local Temporarily changes where the workflow is executed to the 'local' machine. See also the nextflow.config
file for more details. [Default: slurm]
<<
""".stripIndent()
}