Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update default pangolin docker image to image with pangolin-data v1.26 #394

Merged
merged 8 commits into from
Apr 4, 2024

Conversation

kapsakcj
Copy link
Contributor

@kapsakcj kapsakcj commented Mar 26, 2024

This PR closes #390

🗑️ This dev branch should be deleted after merging to main.

🧠 Aim, Context and Functionality

Update default docker image used for running pangolin across all TheiaCov workflows & Pangolin_update standalone workflow

🛠️ Impacted Workflows/Tasks & Changes Being Made

Updated the default docker image in organism_parameters subwf. tested successfully with miniwdl

This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes

Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No

📋 Workflow/Task Step Changes

🔄 Data Processing

Nothing has changed other than the docker image

Docker/software or software versions changed: us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.25.1 ➡️ us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.26. This is a copy of StaPH-B's docker image for pangolin.

Databases or database versions changed: pangolin-data (database) upgraded to v1.26

Data processing/commands changed: No

File processing changed: No

Compute resources changed: No

➡️ Inputs

N/A

⬅️ Outputs

N/A

🧪 Testing

Test Dataset

Tested on random sars-cov-2 pulled from gisaid

Commandline Testing with MiniWDL or Cromwell (optional)

theiacov_fasta local test:

$ miniwdl run ~/github/public_health_bioinformatics/workflows/theiacov/wf_theiacov_fasta.wdl samplename=EPI_ISL_18606234 assembly_fasta= EPI_ISL_18606234.fasta seq_method="unknown" input_assembly_method="unknown"

[most output redacted for brevity]

2024-03-26 15:45:12.570 wdl.w:theiacov_fasta done
{
  "dir": "/home/curtis_kapsak/tests-galore/testdata/sarscov2/20240326_154444_theiacov_fasta",
  "outputs": {
    "theiacov_fasta.abricate_flu_database": null,
    "theiacov_fasta.abricate_flu_results": null,
    "theiacov_fasta.abricate_flu_subtype": null,
    "theiacov_fasta.abricate_flu_type": null,
    "theiacov_fasta.abricate_flu_version": null,
    "theiacov_fasta.assembly_length_unambiguous": 29737,
    "theiacov_fasta.assembly_method": "unknown",
    "theiacov_fasta.auspice_json": "/home/curtis_kapsak/tests-galore/testdata/sarscov2/20240326_154444_theiacov_fasta/out/auspice_json/EPI_ISL_18606234.nextclade.auspice.json",
    "theiacov_fasta.nextclade_aa_dels": "S:N211-",
    "theiacov_fasta.nextclade_aa_subs": "E:T9I,M:D3H,M:Q19E,M:A63T,M:A104V,N:P13L,N:R203K,N:G204R,N:Q229K,N:S413R,ORF1a:S135R,ORF1a:A211D,ORF1a:T842I,ORF1a:V1056L,ORF1a:G1307S,ORF1a:T1542I,ORF1a:N2526S,ORF1a:A2710T,ORF1a:L3027F,ORF1a:T3090I,ORF1a:T3255I,ORF1a:P3395H,ORF1a:V3593F,ORF1a:T4175I,ORF1b:P314L,ORF1b:R1315C,ORF1b:I1566V,ORF1b:T2163I,ORF3a:T223I,ORF6:D61L,ORF9b:P10S,S:T19I,S:R21T,S:S50L,S:V127F,S:G142D,S:F157S,S:R158G,S:I197V,S:L212I,S:V213G,S:L216F,S:H245N,S:A264D,S:I332V,S:G339H,S:K356T,S:S371F,S:S373P,S:S375F,S:T376A,S:D405N,S:R408S,S:K417N,S:N440K,S:V445H,S:G446S,S:N450D,S:L452W,S:N460K,S:S477N,S:T478K,S:N481K,S:E484K,S:F486P,S:Q498R,S:N501Y,S:Y505H,S:E554K,S:A570V,S:D614G,S:P621S,S:H655Y,S:N679K,S:P681R,S:N764K,S:D796Y,S:S939F,S:Q954H,S:N969K,S:P1143L",
    "theiacov_fasta.nextclade_clade": "23I (Omicron)",
    "theiacov_fasta.nextclade_docker": "us-docker.pkg.dev/general-theiagen/nextstrain/nextclade:2.14.0",
    "theiacov_fasta.nextclade_ds_tag": "2023-12-03T12:00:00Z",
    "theiacov_fasta.nextclade_json": "/home/curtis_kapsak/tests-galore/testdata/sarscov2/20240326_154444_theiacov_fasta/out/nextclade_json/EPI_ISL_18606234.nextclade.json",
    "theiacov_fasta.nextclade_lineage": "BA.2.86.5",
    "theiacov_fasta.nextclade_qc": "good",
    "theiacov_fasta.nextclade_tsv": "/home/curtis_kapsak/tests-galore/testdata/sarscov2/20240326_154444_theiacov_fasta/out/nextclade_tsv/EPI_ISL_18606234.nextclade.tsv",
    "theiacov_fasta.nextclade_version": "nextclade 2.14.0",
    "theiacov_fasta.number_Degenerate": 3,
    "theiacov_fasta.number_N": 66,
    "theiacov_fasta.number_Total": 29806,
    "theiacov_fasta.pango_lineage": "BA.2.86.5",
    "theiacov_fasta.pango_lineage_expanded": "B.1.1.529.2.86.5",
    "theiacov_fasta.pango_lineage_report": "/home/curtis_kapsak/tests-galore/testdata/sarscov2/20240326_154444_theiacov_fasta/out/pango_lineage_report/EPI_ISL_18606234.pangolin_report.csv",
    "theiacov_fasta.pangolin_assignment_version": "pangolin 4.3.1; PUSHER-v1.26",
    "theiacov_fasta.pangolin_conflicts": "0.0",
    "theiacov_fasta.pangolin_docker": "us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.26",
    "theiacov_fasta.pangolin_notes": "Usher placements: BA.2.86.5(1/1)",
    "theiacov_fasta.pangolin_versions": "pangolin: 4.3.1;pangolin-data: 1.26;constellations: v0.1.12;scorpio: 0.3.19;pangolin-assignment: 1.26;usher 0.6.3",
    "theiacov_fasta.percent_reference_coverage": 99.44,
    "theiacov_fasta.qc_check": null,
    "theiacov_fasta.qc_standard": null,
    "theiacov_fasta.seq_platform": "unknown",
    "theiacov_fasta.theiacov_fasta_analysis_date": "2024-02-23",
    "theiacov_fasta.theiacov_fasta_version": "PHB v1.3.0-main",
    "theiacov_fasta.vadr_alerts_list": "/home/curtis_kapsak/tests-galore/testdata/sarscov2/20240326_154444_theiacov_fasta/out/vadr_alerts_list/EPI_ISL_18606234.vadr.alt.list",
    "theiacov_fasta.vadr_docker": "us-docker.pkg.dev/general-theiagen/staphb/vadr:1.5.1",
    "theiacov_fasta.vadr_fastas_zip_archive": "/home/curtis_kapsak/tests-galore/testdata/sarscov2/20240326_154444_theiacov_fasta/out/vadr_fastas_zip_archive/EPI_ISL_18606234_vadr-fasta-files.zip",
    "theiacov_fasta.vadr_num_alerts": "0"
  }
}

Terra Testing

Tested with 33 sars-cov-2 genomes from GISAID: ✅ https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/027b31a6-2fb6-4e13-8b40-3dcfc53a0a87

  • outputs look good and as expected; new docker image was used

Suggested Scenarios for Reviewer to Test

Test with any sars-cov-2 sample (assembly, reads, anything will be fine)

Theiagen Version Release Testing (optional)

  • Will changes require functional or validation testing (checking outputs etc) during the release? No
  • Do new samples need to be added to validation datasets? If so, upload these to the appropriate validation workspace Google bucket (). Please describe the new samples here and why these have been chosen. No
  • Are there any output files that should be checked after running the version release testing? No

🔬 Final Developer Checklist

  • The workflow/task has been tested locally and results, including file contents, are as anticipated
  • The workflow/task has been tested on Terra and results, including file contents, are as anticipated
  • The CI/CD has been adjusted and tests are passing (to be completed by Theiagen developer)
  • Code changes follow the style guide

🎯 Reviewer Checklist

  • All impacted workflows/tasks have been tested on Terra with a different dataset than used for development
  • All reviewer-suggested scenarios have been tested and any additional
  • All changed results have been confirmed to be accurate
  • All workflows/tasks impacted by change/s have been tested using a standard validation dataset to ensure no unintended change of functionality
  • All code adheres to the style guide
  • MD5 sums have been updated
  • The PR author has addressed all comments

🗂️ Associated Documentation (to be completed by Theiagen developer)

  • Relevant documentation on the Public Health Resources "PHB Main" has been updated
  • Workflow diagrams have been updated to reflect changes

@kapsakcj
Copy link
Contributor Author

FYI This PR has a lot of the same CI-related changes as PR #375

I tried my best to make them identically - but there may be some merge conflicts on either this PR or PR 375 depending on which one gets merged first. I'm happy to help resolve them if they occur

@kapsakcj kapsakcj marked this pull request as ready for review March 26, 2024 21:11
@@ -11,6 +11,7 @@ wf_theiacov_clearlabs:
- tasks/species_typing/betacoronavirus/task_pangolin.wdl
- tasks/quality_control/basic_statistics/task_gene_coverage.wdl
- tasks/task_versioning.wdl
- workflows/utilities/wf_organism_parameters.wdl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof. Good catch here. Guessing we could have added this here a while ago

@kevinlibuit
Copy link
Contributor

Nice! Straight forward and clean. Will merge pending successful functional check.

@kevinlibuit kevinlibuit merged commit c0925a8 into main Apr 4, 2024
13 checks passed
@kapsakcj kapsakcj deleted the cjk-pangolin-data1.26 branch April 17, 2024 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

update default pangolin docker prior to v2.0.0 release
2 participants