Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export ancestral AA sequences for tree's root node #1317

Merged
merged 2 commits into from
Sep 20, 2023

Conversation

huddlej
Copy link
Contributor

@huddlej huddlej commented Sep 19, 2023

Description of proposed changes

Fixes a bug in the new augur ancestral interface for amino acid sequences where the output JSON was missing an entry for "aa_sequences" for the root node that downstream tools like augur clades relied on. The augur translate output creates this entry [1], so workflows that use the nucleotide ancestral followed by the translate command would not be affected. Workflows that use the new ancestral command would be affected in cases where augur clades looks for amino acid mutations that only exist in the root node and not on any subsequent branches (e.g., for seasonal influenza B/Victoria's NA tree).

[1]

augur/augur/translate.py

Lines 291 to 295 in b299059

if n==tree.root:
aa_muts[n.name]={"aa_muts":{}, "aa_sequences":{}}
for fname, aln in translations.items():
if n.name in aln:
aa_muts[n.name]["aa_sequences"][fname] = "".join(aln[n.name])

Examples

Below is an example of a broken subclade annotation for a B/Vic NA tree when the MRCA doesn't date back far enough for subclade amino acid substitutions to occur on branches descending from the root.

image

In contrast, a recent public Vic NA tree has older sequences, so the MRCA dates back farther and subclades appear as expected.

image

Re-running augur ancestral and augur clades on the first Vic NA tree above from the code in this PR produces this tree with the expected subclades.

image

Related issue(s)

Related to #1258

Checklist

  • Adds functional test for expected output
  • Checks pass
  • If making user-facing changes, add a message in CHANGES.md summarizing the changes in this PR

@codecov
Copy link

codecov bot commented Sep 19, 2023

Codecov Report

Patch coverage is 100.00% of modified lines.

❗ Current head 34f2f47 differs from pull request most recent head 1212730. Consider uploading reports for the commit 1212730 to get more accurate results

Files Changed Coverage
augur/ancestral.py 100.00%

📢 Thoughts on this report? Let us know!.

Fixes a bug in the new augur ancestral interface for amino acid
sequences where the output JSON was missing an entry for "aa_sequences"
for the root node that downstream tools like `augur clades` relied on.
The `augur translate` output creates this entry [1], so workflows that
use the nucleotide `ancestral` followed by the `translate` command would
not be affected. Workflows that use the new `ancestral` command would be
affected in cases where `augur clades` looks for amino acid mutations
that only exist in the root node and not on any subsequent
branches (e.g., for seasonal influenza B/Victoria's NA tree).

[1] https://github.com/nextstrain/augur/blob/b299059b38b1f579a70129c295405a19eb3f7c06/augur/translate.py#L291-L295
@huddlej huddlej force-pushed the export-root-aa-sequences-from-ancestral branch from 34f2f47 to 1212730 Compare September 20, 2023 16:15
@huddlej huddlej merged commit 82ed696 into master Sep 20, 2023
24 checks passed
@huddlej huddlej deleted the export-root-aa-sequences-from-ancestral branch September 20, 2023 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

3 participants