Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get kmer count in query fastq files with kmer database #50

Open
leon945945 opened this issue Aug 2, 2024 · 2 comments
Open

How to get kmer count in query fastq files with kmer database #50

leon945945 opened this issue Aug 2, 2024 · 2 comments

Comments

@leon945945
Copy link

Hi, I generated a kmer database with meryl count command, and I want to count the kmers of the database in different query fastq files, how to do it with meryl.
I saw that meryl-lookup did the similar job, but the output file of meryl-lookup is not what I want, I want to compare the number of each kmer in different queries, my expected output is like this:

input1 input2
kmer1 count count
kmer2 count count
.......

Hope for your reply.

@arangrhie
Copy link
Contributor

Hi,

I would do the kmer counting first, and do an intersection with the kmer database of interest (let's say original.meryl).
For example for input1:

meryl count k=$k input1.fastq.gz output input1.meryl
meryl intersect input1.meryl original.meryl
meryl print input1.meryl

The output will give you the counts in input1.meryl, for the kmers that exists in both input1.meryl and original.meryl.
Then repeat this for input2, and merge the results.

meryl-lookup adds counts of the input meryl kmer database. It would not do the counting of the input sequence (your fastq reads in your case).
If you'd like to use meryl-lookup, and by-pass the intersect / print / manual merging, you could do it the other way around by using input1.meryl and input2.meryl on your original sequence. You could potentially make fake sequences, making each kmer its own sequence from your original.meryl with meryl print result.

@leon945945
Copy link
Author

Thanks for your quick reply.
I get the idea of comparing kmer count in multiple fastq files. The kmers whose count number equals to 0 in input1.meryl could be extracted with meryl difference original.meryl input1.meryl command.
Due to the large amounts of kmer sequences, maybe the manual merging is a toughh work to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants