extend splitHaplotype with fastq output option #2147

ASLeonard · 2022-07-20T11:45:18Z

Fairly straightforward changes to allow splitHaplotype to take a -fastq flag to print out triobinned fastq reads. Canu may not use the quality values in assembly, but as the primary triobinning program, this allows users to bin fastq reads directly instead of a slower "triobin fasta -> get read IDs -> extract fastq" process.

I tested this on my data for both binning fasta (normal) and fastq (with -fastq) and both appear to be working correctly.

Not sure on the memory implications of storing the quality values, could optionally uncomment this line

//if (g->_fastqOutput)
s->_quals[rr].set((const char*)seq.quals(), seq.length());

so only if the output is fastq do you load in the quals. But if the memory is initialised at _quals = new simpleString [_maxReads]; then this may not do much.

Also I reused the simpleString structure, which required casting to and from unsigned to signed char but this shouldn't be problematic.

ASLeonard · 2022-07-25T08:18:26Z

I also extended this to allow for seq.flags() (which is so beautifully accessible already), as this also nicely allows for extracting fastq from uBAMs with special sam tags, triobinning, and the re-aligning with the special sam tags carried over. However, this is a less common use-case, so I won't include that here without discussion.

extend splitHaplotype with fastq output option

c6bfbb4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extend splitHaplotype with fastq output option #2147

extend splitHaplotype with fastq output option #2147

ASLeonard commented Jul 20, 2022

ASLeonard commented Jul 25, 2022

extend splitHaplotype with fastq output option #2147

Are you sure you want to change the base?

extend splitHaplotype with fastq output option #2147

Conversation

ASLeonard commented Jul 20, 2022

ASLeonard commented Jul 25, 2022