Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update RefSeq? #161

Open
asaldivar93 opened this issue Jul 14, 2021 · 3 comments
Open

Update RefSeq? #161

asaldivar93 opened this issue Jul 14, 2021 · 3 comments

Comments

@asaldivar93
Copy link

asaldivar93 commented Jul 14, 2021

Hi,
I'm using Mash to detect contamination in de-novo genome assemblies, together with other tools that work on the latest release of the RefSeq database. Is it possible to build a sketch file for the genomes in the latest release using a PC with 16Gb RAM?

If it is, could you share the workflow necessary to do it?
If it is not, is someone willing to do the work and share the file?

Any help will be greatly appreciated

@Caiyulu-818
Copy link

yes.is the refseq.genomes.k21.s1000.msh is the latest version ?

@kbessonov1984
Copy link

kbessonov1984 commented Sep 19, 2023

No, it is quite old. I would advise to create a new sketch. NCBI RefSeq now has 330,648 genome reference assemblies while the sketch has 91,282. Sometimes I hit deprecated accession numbers that are removed from new metadata assembly_file_manifest.txt

@marchoeppner
Copy link

Hi,

I suppose that development of MASH has more or less seized (which is fine, since it still works). But - would you consider updating your hosted version of the Refseq db/sketch?

It is of course possible to just build a new sketch locally, based on whatever the current version is; but unfortunately, Refseq is essentially "fluid" so it becomes rather difficult to do this reproducibly so people can compare their results. For example when using published pipelines.

Having a centrally hosted version of a more recent release of Refseq would certainly be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants