Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run sort command without pre-calculated minimizer index #1554

Open
ivan-aksamentov opened this issue Nov 20, 2024 · 0 comments
Open

Run sort command without pre-calculated minimizer index #1554

ivan-aksamentov opened this issue Nov 20, 2024 · 0 comments
Labels
help wanted Extra attention is needed package: nextclade_cli package: nextclade t:feat Type: request of a new feature, functionality, enchancement

Comments

@ivan-aksamentov
Copy link
Member

ivan-aksamentov commented Nov 20, 2024

Context: https://discussion.nextstrain.org/t/nextclade-sort-using-a-serverless-local-dataset-for-sorting/1777

Nextclade CLI sort command currently requires minimizer_index.json, either from dataset server or one that's provided using -m parameter.

The minimizer_index.json contains a mapping from dataset/ref names to the list of minimizers (hashes of ref sequence fragments).

In principle, Nextclade CLI could calculate this index if given a set of reference sequences. The code already exists - it already has to calculate minimizers of query sequences, so it might as well calculate minimizers for reference sequences.

We could add a parameter to the sort command (e.g. --input-ref, -r) to provide one or more fasta files with ref sequences, and in this case Nextclade CLI would not require either -m or fetching the index from a server. It would instead calculate minimizer index from the provided sequences and then immediately proceed to sorting. This could also be implemented as a separate command which only generates the minimizer index file - this way the produced index file could be reused (with sort -m), without making repeated minimizer index calculations on every run.

This should improve user experience for people who want to use sort command with their own reference sequences. Currently they have to go through building a custom minimizer_index.json which is a non-trivial task.

@ivan-aksamentov ivan-aksamentov added t:feat Type: request of a new feature, functionality, enchancement good first issue Good for newcomers help wanted Extra attention is needed needs triage Mark for review and label assignment package: nextclade package: nextclade_cli and removed good first issue Good for newcomers help wanted Extra attention is needed needs triage Mark for review and label assignment labels Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed package: nextclade_cli package: nextclade t:feat Type: request of a new feature, functionality, enchancement
Projects
None yet
Development

No branches or pull requests

1 participant