Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend splitHaplotype with fastq output option #2147

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ASLeonard
Copy link

Fairly straightforward changes to allow splitHaplotype to take a -fastq flag to print out triobinned fastq reads. Canu may not use the quality values in assembly, but as the primary triobinning program, this allows users to bin fastq reads directly instead of a slower "triobin fasta -> get read IDs -> extract fastq" process.

I tested this on my data for both binning fasta (normal) and fastq (with -fastq) and both appear to be working correctly.

Not sure on the memory implications of storing the quality values, could optionally uncomment this line

//if (g->_fastqOutput)
s->_quals[rr].set((const char*)seq.quals(), seq.length());

so only if the output is fastq do you load in the quals. But if the memory is initialised at _quals = new simpleString [_maxReads]; then this may not do much.

Also I reused the simpleString structure, which required casting to and from unsigned to signed char but this shouldn't be problematic.

@ASLeonard
Copy link
Author

I also extended this to allow for seq.flags() (which is so beautifully accessible already), as this also nicely allows for extracting fastq from uBAMs with special sam tags, triobinning, and the re-aligning with the special sam tags carried over. However, this is a less common use-case, so I won't include that here without discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant