-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spare not needed groupBy when calling toFragments() on AlignmentDataset #2281
Comments
@heuermh Would love your thoughts on that before I implement it. |
If all you want to do is a straight conversion 1:1 of An example of this can be found in the unit tests I think a new method Then for calling bowtie, in Cannoli we have There isn't currently a |
These are good, but I want to use the knowledge ADAM already has on the data instead of relying on the user to know it, or maybe there's some problem regarding this that I don't know of? Something like that (taken from loadAlignments): BAM -> unpaired |
Those assumptions can fall apart though, from experience BAM/CRAM/SAM files can contain paired reads, unpaired reads, aligned reads, and unaligned reads. It is common to use unaligned BAM (uBAM) in workflows instead of FASTQ because it compresses better. We would of course encourage the use of Parquet because it compresses better, doesn't have problems with split guessing, can take advantage of push down predicates and column projection, and can be read/write concurrently in distributed fashion across a cluster. 😉 That said, please feel free to suggest changes! |
Hi!
I'm running a process that is pre-processing a bunch of reads before aligning them using Bowtie. Most of them are unpaired, so when I run toFragments(), I need to groupBy() them for no actual reason. Is there a way to spare this groupBy?
Looking at the code, I think we can add a variable to signify when we know for sure when we have unpaired files. When we are unsure, we'll do the groupBy anyway (maybe let the user tell us by adding a parameter to loadAlignments).
I'd love to implement it.
WDYT?
Ben
The text was updated successfully, but these errors were encountered: