-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Is there an example of buffered read/seek in chunks? #73
Comments
I don't quite understand how your use of |
Also, keep in mind that XFLATE operates differently than BGZF. BGZF is effectively a linked-list of independently compressed segments, so you need to read through the whole file to determine the boundaries of each segment. In contrast, XFLATE contains an index that reports the location of each segment in O(1). Thus, you can seek to the middle of an XFLATE file without needing to ever read all the content before that point. |
ok. here's the use case. i have a very large json per line gzipped file. in first pass i use in second pass i use that works fine. very little RAM and fast enough moving through the compressed file finding the particular json via its meanwhile i played with rac and did somewhat similar approach. there i didn't have to do two passes against a compressed archive as so both of this experiments gave me the way to query the very large compressed archive of many lines of json records for some chunk where a particular json will be found. for any particular query it uses very little RAM and it is fairly fast. i am sure XFLATE could be used for this use case. i just couldn't figure out how to use the reference to the compressed archive (e.g. i hope this explains it better. just to mention: i managed to use |
I was playing with bgzf archives and it was fairly easy to use
bgzf.Reader
inbufio.Reader
so the the archive could be read in chunks. In one pass I would make an useful index of offsets so later on I could use the very large archive as if it was memory mapped file on the disk.I tried to find if there's any example of using xflate in the similar way. All of the examples I could find would read the whole compressed archive into the memory.
So, my question is, is there an example of buffered read/seek in chunks of the compressed "xflated" archive?
I found the custom implementation of what I tried to describe here via
io.ReadSeeker
as not trivial one. So if there's already an example I would appreciate it immensely :)The text was updated successfully, but these errors were encountered: