Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement subsampling via a script #711

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Implement subsampling via a script #711

wants to merge 3 commits into from

Commits on Aug 23, 2021

  1. Configuration menu
    Copy the full SHA
    ed9d3c2 View commit details
    Browse the repository at this point in the history
  2. Refactor distances, priorities into functions

    This is in preparation for a  ubsample script to call these
    python functions directly, as opposed to scripts called from the
    command line. In time, this script will become `augur subsample`.
    
    The work done by the functions is unchanged, thus they still don't
    return any information rather they write information to disk. Future
    work will change this so that information can flow back to the calling
    program without needing to be written to disk.
    jameshadfield committed Aug 23, 2021
    Configuration menu
    Copy the full SHA
    946d3d2 View commit details
    Browse the repository at this point in the history

Commits on Aug 25, 2021

  1. Implement subsampling via a script

    This implements a new script to encapsulate the subsampling logic formerly encoded in snakemake rules. This is in preparation for moving this script to the augur repo where it will become `augur subsample`. (We have chosen to develop this in the ncov repo for simplicity.)
    
    The script currently uses the same approach as the former snakemake rules, however python functions are called rather than scripts / augur commands. Briefly, the steps are:
    
    1. A subsampling scheme is provided, parsed, validated, and turned into a simple graph to indicate which samples rely on other samples having been computed (i.e. which are needed for priorities)
    2. Each sample is computed by calling the run function of augur filter
    3. If priorities need to be calculated for a sample to be computed, this is achieved by calling functions from the two existing scripts.
    4. The set of sequences to include in each sample is combined, and outputs written.
    jameshadfield committed Aug 25, 2021
    Configuration menu
    Copy the full SHA
    f7b1169 View commit details
    Browse the repository at this point in the history