Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to nextclade v3 & update default dataset tags #375

Merged
merged 24 commits into from
Apr 4, 2024
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
6e34d2d
added new WDL task for nextclade v3. tested w miniwdl. not added to w…
kapsakcj Mar 5, 2024
1507c6e
added common miniwdl output directories to .gitignore
kapsakcj Mar 5, 2024
734700a
update sars-cov-2 nextclade defaults; removed unnecessary nextclade_d…
kapsakcj Mar 6, 2024
c2703bf
updates to nextclade v3 task
kapsakcj Mar 6, 2024
cb84f4c
update theiacov_fasta to use nextclade v3 task. tested successfully w…
kapsakcj Mar 6, 2024
0dac722
update nextclade defaults for non-sc2 organisms. Have not tested at a…
kapsakcj Mar 6, 2024
f93e691
update to nextclade 3.3.1 and implement --verbosity flag for nextclad…
kapsakcj Mar 6, 2024
4c3c21f
updated WDL task for adding samples to nextclade ref tree. tested fin…
kapsakcj Mar 6, 2024
dc08e61
update Sample_to_ref_tree_PHB workflow: removed old inputs and made a…
kapsakcj Mar 6, 2024
cd2afec
updated theiacov_fasta_batch, ilmn pe, ilmn se, and ont to use nextcl…
kapsakcj Mar 7, 2024
477a216
update theiacov_clearlabs to use nextclade_v3. did not test with mini…
kapsakcj Mar 21, 2024
7e8e9ab
Merge remote-tracking branch 'origin/main' into cjk-nextclade-v3
kapsakcj Mar 22, 2024
6982979
fix import path for organism_paramteters subwf in theiacov_clearlabs …
kapsakcj Mar 22, 2024
fbf0b49
shellcheck lied to me. reverting last commit
kapsakcj Mar 22, 2024
303a9b4
update theiacov_fasta CI
kapsakcj Mar 22, 2024
f89b0bd
update theiacov_clearlabs CI
kapsakcj Mar 22, 2024
ef1a6ac
update theiacov_ont CI
kapsakcj Mar 22, 2024
e654a22
re-enable theiacov_illumina_pe and se CI workflows; update them for n…
kapsakcj Mar 22, 2024
728504a
Merge remote-tracking branch 'origin/main' into cjk-nextclade-v3
kapsakcj Mar 28, 2024
6324136
update CI
kapsakcj Mar 28, 2024
2b7470a
nextclade_v3 task: removed unused pcr_primers_csv input; added back i…
kapsakcj Apr 4, 2024
0800fa0
nextclade_addToRefTree task and wf change: remove input-pcr-primers o…
kapsakcj Apr 4, 2024
84d506a
Merge remote-tracking branch 'origin/main' into cjk-nextclade-v3
kapsakcj Apr 4, 2024
7381cb6
corrected input file type for input-ref
kapsakcj Apr 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .github/workflows/pytest-workflows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,11 @@ jobs:
# For every workflow, test it with MiniWDL and Cromwell
tag: ["${{ fromJson(needs.changes.outputs.workflows) }}"]
engine: ["miniwdl", "cromwell"]
exclude:
- tag: "wf_theiacov_illumina_pe"
engine: "miniwdl"
- tag: "wf_theiacov_illumina_se"
engine: "miniwdl"
#exclude:
kevinlibuit marked this conversation as resolved.
Show resolved Hide resolved
# - tag: "wf_theiacov_illumina_pe"
# engine: "miniwdl"
# - tag: "wf_theiacov_illumina_se"
# engine: "miniwdl"
defaults:
run:
# Play nicely with miniconda
Expand Down
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
cromwell*
cromwell*
_LAST
2024*
101 changes: 82 additions & 19 deletions tasks/taxon_id/task_nextclade.wdl
kevinlibuit marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,72 @@ task nextclade {
}
}

task nextclade_v3 {
meta {
description: "Nextclade classification of one sample. Leaving optional inputs unspecified will use SARS-CoV-2 defaults."
}
input {
File genome_fasta
File? auspice_reference_tree_json
File? gene_annotations_gff
File? pcr_primers_csv
File? nextclade_pathogen_json
String docker = "us-docker.pkg.dev/general-theiagen/nextstrain/nextclade:3.3.1"
String dataset_name
String verbosity = "warn" # other options are: "off" "error" "info" "debug" and "trace"
String dataset_tag
Int disk_size = 50
Int memory = 4
Int cpu = 2
}
String basename = basename(genome_fasta, ".fasta")
command <<<
# track version & print to log
nextclade --version | tee NEXTCLADE_VERSION

# --reference no longer used in v3. consolidated into --name and --tag
nextclade dataset get \
--name="~{dataset_name}" \
--tag="~{dataset_tag}" \
-o nextclade_dataset_dir \
--verbosity ~{verbosity}

# exit script/task upon error
set -e

# not necessary to include `--jobs <jobs>` in v3. Nextclade will use all available CPU threads by default. It's fast so I don't think we will need to change unless we see errors
nextclade run \
--input-dataset nextclade_dataset_dir/ \
~{"--input-tree " + auspice_reference_tree_json} \
~{"--input-pathogen-json " + nextclade_pathogen_json} \
~{"--input-annotation " + gene_annotations_gff} \
~{"--input-pcr-primers " + pcr_primers_csv} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks as if --input-pcr-primers was also removed as an input flag

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the --input-root-seq flag (and associated root_sequencetask input) were removed from the task, but looks as if these were just renamed in nextclade v3 ^^ info in same docs linked above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thank you. I will remove the --input-pcr-primers and add the --input-root-seq to --input-ref as specified in their docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved in 2b7470a

--output-json "~{basename}".nextclade.json \
--output-tsv "~{basename}".nextclade.tsv \
--output-tree "~{basename}".nextclade.auspice.json \
--output-all . \
--verbosity ~{verbosity} \
"~{genome_fasta}"
>>>
runtime {
docker: "~{docker}"
memory: "~{memory} GB"
cpu: cpu
disks: "local-disk " + disk_size + " SSD"
disk: disk_size + " GB" # TES
dx_instance_type: "mem1_ssd1_v2_x2"
maxRetries: 3
}
output {
String nextclade_version = read_string("NEXTCLADE_VERSION")
File nextclade_json = "~{basename}.nextclade.json"
File auspice_json = "~{basename}.nextclade.auspice.json"
File nextclade_tsv = "~{basename}.nextclade.tsv"
String nextclade_docker = docker
String nextclade_dataset_tag = "~{dataset_tag}"
}
}

task nextclade_output_parser {
meta {
description: "Python and bash codeblocks for parsing the output files from Nextclade."
Expand Down Expand Up @@ -163,52 +229,49 @@ task nextclade_add_ref {
}
input {
File genome_fasta
File? root_sequence
File? reference_tree_json
File? qc_config_json
File? nextclade_pathogen_json
File? gene_annotations_gff
File? pcr_primers_csv
File? virus_properties
String docker = "us-docker.pkg.dev/general-theiagen/nextstrain/nextclade:2.14.0"
String docker = "us-docker.pkg.dev/general-theiagen/nextstrain/nextclade:3.3.1"
String dataset_name
String? dataset_reference
String? dataset_tag
Int disk_size = 50
Int memory = 8
String verbosity = "warn" # other options are: "off" "error" "info" "debug" and "trace"
Int disk_size = 100
Int memory = 4
Int cpu = 2
}
String basename = basename(genome_fasta, ".fasta")
command <<<
NEXTCLADE_VERSION="$(nextclade --version)"
echo $NEXTCLADE_VERSION > NEXTCLADE_VERSION
# track version & print to log
nextclade --version | tee NEXTCLADE_VERSION

echo "DEBUG: downloading nextclade dataset..."
nextclade dataset get \
--name="~{dataset_name}" \
~{"--reference " + dataset_reference} \
~{"--tag " + dataset_tag} \
-o nextclade_dataset_dir \
--verbose
--verbosity ~{verbosity}

# If no referece sequence is provided, use the reference tree from the dataset
# If no reference sequence is provided, use the reference tree from the dataset
if [ -z "~{reference_tree_json}" ]; then
echo "Default dataset reference tree JSON will be used"
cp nextclade_dataset_dir/tree.json reference_tree.json
cp -v nextclade_dataset_dir/tree.json reference_tree.json
else
echo "User reference tree JSON will be used"
cp ~{reference_tree_json} reference_tree.json
cp -v ~{reference_tree_json} reference_tree.json
fi

tree_json="reference_tree.json"

set -e
echo "DEBUG: running nextclade..."
nextclade run \
--input-dataset=nextclade_dataset_dir/ \
~{"--input-root-seq " + root_sequence} \
--input-dataset nextclade_dataset_dir/ \
--input-tree ${tree_json} \
~{"--input-qc-config " + qc_config_json} \
~{"--input-gene-map " + gene_annotations_gff} \
~{"--input-pathogen-json " + nextclade_pathogen_json} \
~{"--input-annotation " + gene_annotations_gff} \
~{"--input-pcr-primers " + pcr_primers_csv} \
~{"--input-virus-properties " + virus_properties} \
--output-json "~{basename}".nextclade.json \
--output-tsv "~{basename}".nextclade.tsv \
--output-tree "~{basename}".nextclade.auspice.json \
Expand Down
81 changes: 36 additions & 45 deletions tests/workflows/theiacov/test_wf_theiacov_clearlabs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -212,67 +212,58 @@
- path: miniwdl_run/call-ncbi_scrub_se/work/clearlabs_R1_dehosted.fastq.gz
- path: miniwdl_run/call-ncbi_scrub_se/work/r1.fastq
- path: miniwdl_run/call-ncbi_scrub_se/work/r1.fastq.clean
- path: miniwdl_run/call-nextclade/command
- path: miniwdl_run/call-nextclade/inputs.json
- path: miniwdl_run/call-nextclade_v3/command
- path: miniwdl_run/call-nextclade_v3/inputs.json
contains: ["dataset_name", "dataset_tag", "genome_fasta"]
- path: miniwdl_run/call-nextclade/outputs.json
- path: miniwdl_run/call-nextclade_v3/outputs.json
contains: ["nextclade", "nextclade_json", "nextclade_version"]
- path: miniwdl_run/call-nextclade/stderr.txt
- path: miniwdl_run/call-nextclade/stderr.txt.offset
- path: miniwdl_run/call-nextclade/stdout.txt
- path: miniwdl_run/call-nextclade/task.log
- path: miniwdl_run/call-nextclade_v3/stderr.txt
- path: miniwdl_run/call-nextclade_v3/stderr.txt.offset
- path: miniwdl_run/call-nextclade_v3/stdout.txt
- path: miniwdl_run/call-nextclade_v3/task.log
contains: ["wdl", "theiacov_clearlabs", "nextclade", "done"]
- path: miniwdl_run/call-nextclade/work/NEXTCLADE_VERSION
md5sum: 91a455762183b41af0d8de5596e28e7f
- path: miniwdl_run/call-nextclade/work/_miniwdl_inputs/0/clearlabs.medaka.consensus.fasta
- path: miniwdl_run/call-nextclade_v3/work/NEXTCLADE_VERSION
md5sum: 70aa6879bf9f0e8ba2b9953b0d4a2216
- path: miniwdl_run/call-nextclade_v3/work/_miniwdl_inputs/0/clearlabs.medaka.consensus.fasta
md5sum: d41d8cd98f00b204e9800998ecf8427e
- path: miniwdl_run/call-nextclade/work/clearlabs.medaka.consensus.nextclade.auspice.json
- path: miniwdl_run/call-nextclade/work/clearlabs.medaka.consensus.nextclade.json
- path: miniwdl_run/call-nextclade/work/clearlabs.medaka.consensus.nextclade.tsv
- path: miniwdl_run/call-nextclade/work/nextclade.aligned.fasta
- path: miniwdl_run/call-nextclade_v3/work/clearlabs.medaka.consensus.nextclade.auspice.json
- path: miniwdl_run/call-nextclade_v3/work/clearlabs.medaka.consensus.nextclade.json
- path: miniwdl_run/call-nextclade_v3/work/clearlabs.medaka.consensus.nextclade.tsv
- path: miniwdl_run/call-nextclade_v3/work/nextclade.aligned.fasta
md5sum: eb18c508f26125851279f2c03d4a336c
- path: miniwdl_run/call-nextclade/work/nextclade.csv
- path: miniwdl_run/call-nextclade/work/nextclade.errors.csv
md5sum: 2d1dad70d68e56d0a1191900c17061bc
- path: miniwdl_run/call-nextclade/work/nextclade.insertions.csv
md5sum: 3fb6db0807dc663e2821e0bbbccdc5aa
- path: miniwdl_run/call-nextclade/work/nextclade.ndjson
- path: miniwdl_run/call-nextclade/work/nextclade_dataset_dir/genemap.gff
md5sum: b4bd70a3779718e556a17360a41dce90
- path: miniwdl_run/call-nextclade/work/nextclade_dataset_dir/primers.csv
md5sum: 5990c3483bf66ce607aeb90a44e7ef2e
- path: miniwdl_run/call-nextclade/work/nextclade_dataset_dir/qc.json
md5sum: b01f4491a54941fea12ec5b04a10fb8c
- path: miniwdl_run/call-nextclade/work/nextclade_dataset_dir/reference.fasta
- path: miniwdl_run/call-nextclade_v3/work/nextclade.csv
- path: miniwdl_run/call-nextclade_v3/work/nextclade.ndjson
- path: miniwdl_run/call-nextclade_v3/work/nextclade_dataset_dir/genome_annotation.gff3
md5sum: 4dff84d2d6ada820e0e3a8bc6798d402
- path: miniwdl_run/call-nextclade_v3/work/nextclade_dataset_dir/pathogen.json
md5sum: 9f99ba19333ff907af307611fbb73e21
- path: miniwdl_run/call-nextclade_v3/work/nextclade_dataset_dir/reference.fasta
md5sum: c7ce05f28e4ec0322c96f24e064ef55c
- path: miniwdl_run/call-nextclade/work/nextclade_dataset_dir/sequences.fasta
- path: miniwdl_run/call-nextclade/work/nextclade_dataset_dir/tag.json
- path: miniwdl_run/call-nextclade/work/nextclade_dataset_dir/tree.json
- path: miniwdl_run/call-nextclade/work/nextclade_dataset_dir/virus_properties.json
md5sum: 03bd2f9d33326299b5b49b4910d84183
- path: miniwdl_run/call-nextclade/work/nextclade_gene_E.translation.fasta
- path: miniwdl_run/call-nextclade_v3/work/nextclade_dataset_dir/sequences.fasta
- path: miniwdl_run/call-nextclade_v3/work/nextclade_dataset_dir/tree.json
- path: miniwdl_run/call-nextclade_v3/work/nextclade.cds_translation.E.fasta
md5sum: 14808ad8b34c8bac7de500707400250e
- path: miniwdl_run/call-nextclade/work/nextclade_gene_M.translation.fasta
- path: miniwdl_run/call-nextclade_v3/work/nextclade.cds_translation.M.fasta
md5sum: 4799e5af880d2005da56342d6a9d64ab
- path: miniwdl_run/call-nextclade/work/nextclade_gene_N.translation.fasta
- path: miniwdl_run/call-nextclade_v3/work/nextclade.cds_translation.N.fasta
md5sum: bbc46cedb153b3213a9cf8f425dd906c
- path: miniwdl_run/call-nextclade/work/nextclade_gene_ORF1a.translation.fasta
- path: miniwdl_run/call-nextclade_v3/work/nextclade.cds_translation.ORF1a.fasta
md5sum: 0c1b1bbcbcfe86d10c466bf63fca5c11
- path: miniwdl_run/call-nextclade/work/nextclade_gene_ORF1b.translation.fasta
md5sum: 23a0497efe0ccffaf51b792f40ca5036
- path: miniwdl_run/call-nextclade/work/nextclade_gene_ORF3a.translation.fasta
- path: miniwdl_run/call-nextclade_v3/work/nextclade.cds_translation.ORF1b.fasta
md5sum: bea75a83074a11fa74c316e4df6a3d9f
- path: miniwdl_run/call-nextclade_v3/work/nextclade.cds_translation.ORF3a.fasta
md5sum: 692b2c314c4ff6584a40273dc239cb78
- path: miniwdl_run/call-nextclade/work/nextclade_gene_ORF6.translation.fasta
- path: miniwdl_run/call-nextclade_v3/work/nextclade.cds_translation.ORF6.fasta
md5sum: c1d610f9e45acd3915e40f0d643f0188
- path: miniwdl_run/call-nextclade/work/nextclade_gene_ORF7a.translation.fasta
- path: miniwdl_run/call-nextclade_v3/work/nextclade.cds_translation.ORF7a.fasta
md5sum: a655a6c325b0bc9ad69842fcf2e927a7
- path: miniwdl_run/call-nextclade/work/nextclade_gene_ORF7b.translation.fasta
- path: miniwdl_run/call-nextclade_v3/work/nextclade.cds_translation.ORF7b.fasta
md5sum: 27fd219bb6d18731898a9ddfdee27f67
- path: miniwdl_run/call-nextclade/work/nextclade_gene_ORF8.translation.fasta
- path: miniwdl_run/call-nextclade_v3/work/nextclade.cds_translation.ORF8.fasta
md5sum: 398798980c482562e7c5b21b205e0445
- path: miniwdl_run/call-nextclade/work/nextclade_gene_ORF9b.translation.fasta
- path: miniwdl_run/call-nextclade_v3/work/nextclade.cds_translation.ORF9b.fasta
md5sum: 3d6a949bdcecaf70e9d123651a7a7c5e
- path: miniwdl_run/call-nextclade/work/nextclade_gene_S.translation.fasta
- path: miniwdl_run/call-nextclade_v3/work/nextclade.cds_translation.S.fasta
md5sum: 0ce44a0a8e2784ca4b3e8d8f03211813
- path: miniwdl_run/call-nextclade_output_parser/command
md5sum: f377fb9fc901d440fa35b1b05317a0e1
Expand Down
Loading