-
Notifications
You must be signed in to change notification settings - Fork 5
/
Snakefile
62 lines (53 loc) · 2.6 KB
/
Snakefile
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
"""
This is the main ingest Snakefile that orchestrates the full ingest workflow
and defines its default outputs.
"""
# The workflow filepaths are written relative to this Snakefile's base directory
workdir: workflow.current_basedir
# Use default configuration values. Override with Snakemake's --configfile/--config options.
configfile: "defaults/config.yaml"
segments = ['l', 's', 'gpc']
wildcard_constraints:
segment = "|".join(segments)
# This is the default rule that Snakemake will run when there are no specified targets.
# The default output of the ingest workflow is usually the curated metadata and sequences.
# Nextstrain-maintained ingest workflows will produce metadata files with the
# standard Nextstrain fields and additional fields that are pathogen specific.
# We recommend using these standard fields in custom ingests as well to minimize
# the customizations you will need for the downstream phylogenetic workflow.
# TODO: Add link to centralized docs on standard Nextstrain metadata fields
rule all:
input:
sequences=expand("results/{segment}/sequences.fasta", segment=segments),
metadata=expand("results/{segment}/metadata.tsv", segment=segments),
metadata_all="results/all/metadata.tsv",
# Note that only PATHOGEN-level customizations should be added to these
# core steps, meaning they are custom rules necessary for all builds of the pathogen.
# If there are build-specific customizations, they should be added with the
# custom_rules imported below to ensure that the core workflow is not complicated
# by build-specific rules.
include: "rules/fetch_from_ncbi.smk"
include: "rules/curate.smk"
include: "rules/nextclade.smk"
rule create_final_metadata:
input:
metadata="data/subset_metadata.tsv"
output:
metadata="results/all/metadata.tsv"
shell:
"""
cp {input.metadata} {output.metadata}
"""
# Allow users to import custom rules provided via the config.
# This allows users to run custom rules that can extend or override the workflow.
# A concrete example of using custom rules is the extension of the workflow with
# rules to support the Nextstrain automation that uploads files and sends internal
# Slack notifications.
# For extensions, the user will have to specify the custom rule targets when
# running the workflow.
# For overrides, the custom Snakefile will have to use the `ruleorder` directive
# to allow Snakemake to handle ambiguous rules
# https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#handling-ambiguous-rules
if "custom_rules" in config:
for rule_file in config["custom_rules"]:
include: rule_file