-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use new augur clades functionality #660
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
All callers were removed in "Update syntax for multiple inputs and allow downloading" (40ae575).
…hmark files This rule covers multiple potential "origins" (including GISAID).
This both documents the accepted values and lets Snakemake validate the config automatically.
…nfig keys The patterns parallel the wildcard constraint declarations.
Superseded by changes in "Update clade definitions for emerging clades" (184e25c). I believe this line was missed for removal during a merge.
Workflow inputs (metadata + sequences) will soon be provisioned by ncov-ingest under data.nextstrain.org/files/ncov/open/ and downloaded from there by this profile config. Input data is currently sourced just from GenBank/INSDC, but in time will grow to include other open data sources, such as COG-UK. Based on the nextstrain-genbank profile, renamed to nextstrain-open to reflect the broader scope.
We will start with major regional builds and leave state-level builds to other groups.
Removes params that are already defaults in the workflow and do not need to be in this config.
As this is specific to this profile, intended for internal use, documenting within builds.yaml felt appropriate.
Any defined build sizes will create separate builds with modified names. Parameters for `augur traits` are defined per build name, and thus we wish to duplicate these so that they match the builds created for each build size.
For open builds, the `{trait}_exposure` metadata is identical to the `{trait}` value. Thus we can skip the travel history adjustment rule. This necessitates updates to which values we use for DTA.
Namespaces the Auspice JSONs from just ncov_* into ncov_gisaid_* and ncov_open_*, which will result in URL changes from, e.g. /ncov/global to /ncov/gisaid/global and /ncov/open/global. The results for trial builds are also slightly renamed to include this namespacing *before* the "trial_${trial_name}" prefix.
Internal nextstrain workflows typically generate many datasets. Currently we tend to use a single auspice config JSON for each dataset, despite these configs being essentially identical. Furthermore, the "nextstrain-open" profile was using the config files from the main (GISAID) profile, which were not well suited to the metadata available for the open builds. (Note that the config file deleted in this commit was never being used.) Here we move to generating the auspice configs via a rule, which has a number of advantages. It is now easy to get an overview of the config fields which are the same, and which ones are different across builds in a profile; comments (which are allowed since it's javascript) also help with understanding. A rule allows us to easily have different settings for different builds, and generation may become dynamic in the future. Finally it helps prevent different build config files diverging unintentionally. Currently this is implemented for the nextstrain-open profile, but future work will extend this to the (GISAID) nextstrain profile. There will be a common "base" config which can be imported by both in this case. For users running few builds, it's preferable to avoid this complexity and stick with the config-files approach we currently describe in the tutorials. Workflows with many targets may wish to add their own rules similar to that done here.
The GitHub Actions UI rolls up each step's output, so this will make it easier to quickly see the build info by avoiding the need to scroll past all the build launching output first.
Separate workflow jobs so that they can be independently managed in the GitHub Actions UI. Copy and pasted "nextstrain build" invocations (instead of, e.g., a shared script or YAML anchor) so they can be independently tweaked as needed in the future. For example, I'm starting with the same resources for each, but that's probably unnecessary right now and we may want to tune it sooner than later.
This commit is a WIP commit to test the new functionality being introduced in augur PR 728 [1]. This allows us to simplify the nCoV workflow as we can explicitly define the attribute names used for clade membership and branch labelling. These changes have only been tested for the "open" build, which itself is a WIP. [1] nextstrain/augur#728
jameshadfield
force-pushed
the
open
branch
4 times, most recently
from
June 22, 2021 05:55
b4353ff
to
9ae9f6b
Compare
superseeded by #1000 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit is a WIP commit to test the new functionality
being introduced in augur PR 728 [1]. This allows us to
simplify the nCoV workflow as we can explicitly define the
attribute names used for clade membership and branch
labelling.
The release of 728 will come with a new major version of
augur, and this workflow's requirements should be updated
accordingly.
These changes have only been tested for the "open" build,
which itself is a WIP.
[1] nextstrain/augur#728