-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update RefSeq? #161
Comments
yes.is the refseq.genomes.k21.s1000.msh is the latest version ? |
No, it is quite old. I would advise to create a new sketch. NCBI RefSeq now has 330,648 genome reference assemblies while the sketch has 91,282. Sometimes I hit deprecated accession numbers that are removed from new metadata assembly_file_manifest.txt |
Hi, I suppose that development of MASH has more or less seized (which is fine, since it still works). But - would you consider updating your hosted version of the Refseq db/sketch? It is of course possible to just build a new sketch locally, based on whatever the current version is; but unfortunately, Refseq is essentially "fluid" so it becomes rather difficult to do this reproducibly so people can compare their results. For example when using published pipelines. Having a centrally hosted version of a more recent release of Refseq would certainly be useful. |
Hi,
I'm using Mash to detect contamination in de-novo genome assemblies, together with other tools that work on the latest release of the RefSeq database. Is it possible to build a sketch file for the genomes in the latest release using a PC with 16Gb RAM?
If it is, could you share the workflow necessary to do it?
If it is not, is someone willing to do the work and share the file?
Any help will be greatly appreciated
The text was updated successfully, but these errors were encountered: