-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to nextclade v3 & update default dataset tags #375
Conversation
…ataset_reference inputs and outputs from organism_param subworkflow
…ith miniwdl and sc2 sample
…ll, need to test in Terra
…e with miniwdl and a Flu HA FASTA
…ssembly_fasta input singular
…ade v3. need to test on Terra still (have not tested with miniwdl)
…wdl so will need to test in Terra
I created a clone of the PHB_Validation_TEMPLATE workspace specifically for testing these changes: https://app.terra.bio/#workspaces/theiagen-validations/PHB_Validation_nextcladeV3testing/job_history |
…extclade_v3 and a slew of other updates from previous PRs that were missed since the CI was disabled
Marking as ready for review to get the review process started. I'm working on pulling a couple RSV-A and RSV-B genomes for testing (since the 2 I tested with didn't so well - failed to align well with the nextclade references) so I will test these and post a workflow link when it's completed. 2 RSV-A and 2 RSV-B samples tested via TheiaCoV_FASTA here: https://app.terra.bio/#workspaces/theiagen-validations/PHB_Validation_nextcladeV3testing/job_history/1da2324d-436e-4b60-a79e-04bceb62a6f7 Other things to note about this PR
|
tasks/taxon_id/task_nextclade.wdl
Outdated
~{"--input-tree " + auspice_reference_tree_json} \ | ||
~{"--input-pathogen-json " + nextclade_pathogen_json} \ | ||
~{"--input-annotation " + gene_annotations_gff} \ | ||
~{"--input-pcr-primers " + pcr_primers_csv} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks as if --input-pcr-primers
was also removed as an input flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the --input-root-seq
flag (and associated root_sequence
task input) were removed from the task, but looks as if these were just renamed in nextclade v3 ^^ info in same docs linked above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, thank you. I will remove the --input-pcr-primers
and add the --input-root-seq
to --input-ref
as specified in their docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved in 2b7470a
reference_tree_json = reference_tree_json, | ||
qc_config_json = qc_config_json, | ||
nextclade_pathogen_json = nextclade_pathogen_json, | ||
gene_annotations_gff = gene_annotations_gff, | ||
pcr_primers_csv = pcr_primers_csv, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment above RE input removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved in 0800fa0
Details for days! That was a beast to review. Thanks for all of the documentation and specific workspace for validation runs. I've combed through all of your runs and feel confident in these code changes; also launched a few runs myself within the same workspace for sanity's sake. Just two points to address:
Once these are resolved we can move forward with a merge! |
…nput_ref optional input. tested successfully with miniwdl
…ption and add input-ref option
Thanks for the thorough review, kevin.
I'll update CI first, then... I'll launch a function test of the nextclade_addToRefTree workflow as well as TheiaCoV_FASTA_PHB on the various organisms in our validation dataset to ensure everything runs as expected. EDIT:
|
OK everything should be resolved, I think we are ready for approval and merge |
So FYI the Should we add it back to the WDL task or leave as is? I don't know of any of our users that utilize this option, so I'm tempted to leave it out. |
That's fair. Let's get this closed and avoid scope creep; changes made in this PR address #349 for PHB users. Can you open a new issue for us to add that option back and we can discuss internally when to prioritize things |
Still have lots of work to do, but wanted to at least start a draft PR so folks can see progress.This PR closes #349
🗑️ This dev branch should be deleted after merging to main. (subject to change)
🧠 Aim, Context and Functionality
This PR updates all TheiaCov workflows to use nextclade v3.
A new task was added for nextclade v3 specifically.
CI has undergone major updates to account for these changes AND to re-enable the theiacov_illumina_pe and theiacov_illumina_se CI workflows from running (they were disabled back in July 2023)
🛠️ Impacted Workflows/Tasks & Changes Being Made
All TheiaCov workflows are impacted across all organisms (sars-cov-2, Mpox, Flu, RSV-A, RSV-B) that we support and run through nextclade
This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes*
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : Yes
📋 Workflow/Task Step Changes
🔄 Data Processing
Docker/software or software versions changed: upgraded to
us-docker.pkg.dev/general-theiagen/nextstrain/nextclade:3.3.1
which was originally copied from Nextstrain's image on dockerhubDatabases or database versions changed: All nextclade_dataset_tags have been updated for all organisms
Data processing/commands changed: Details on changes from v2 ➡️ v3 can be found here: https://docs.nextstrain.org/projects/nextclade/en/stable/user/migration-v3.html#dataset-file-format-and-dataset-names-have-changed
File processing changed: The nextclade_output_parsing has not changed. The output TSV of nextclade v3 was readly parsed by the existing WDL task
Compute resources changed: No
➡️ Inputs
String? nextclade_dataset_reference
has been removed from all TheiaCov workflows (and the organism_param subwf) as it is not longer requiredString sc2_nextclade_ds_name = "nextstrain/sars-cov-2/wuhan-hu-1/orfs"
is now a path-like name instead of the names like "sars-cov-2". Those names are still kept as shortcuts, but for futureproofing sake - I have changed these to the new path-like names for all organisms⬅️ Outputs
nextclade_v3
task outputs instead of the oldnextclade
task.🧪 Testing
Test Dataset
I am testing in Terra using our validation datasets that we used to validate new versions of PHB.
Commandline Testing with MiniWDL or Cromwell (optional)
Not going to show cmdline testing, everything will be tested in Terra
Terra Testing
nextclade_dataset_tag
✅nextclade_dataset_tag
✅TODO pull a couple RSV-A and RSV-B samples to test withDoneSuggested Scenarios for Reviewer to Test
Need to test all organisms to ensure functionality exists for all organisms.
It seems a bit extreme, but I'd like to test all impacted workflows since I'd rather catch bugs now, instead of the 11th hour before the v2.0.0 release. SO I've created a terra workspace specifically for this testing https://app.terra.bio/#workspaces/theiagen-validations/PHB_Validation_nextcladeV3testing/job_history
Theiagen Version Release Testing (optional)
🔬 Final Developer Checklist
🎯 Reviewer Checklist
🗂️ Associated Documentation (to be completed by Theiagen developer)