-
Notifications
You must be signed in to change notification settings - Fork 12
Add compressed size prefix to page headers for lz4 v4.1.23 compatibility #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The lz4 library v4.1.23 added frame concatenation support, which peeks ahead after reading a frame to check for another concatenated frame. This broke LTX because each page is an independent LZ4 frame with a PageHeader in between. This change adds a new PageHeaderFlagCompressedSize flag and writes a 4-byte compressed size prefix after each page header. The decoder uses this size to create an exact LimitedReader, preventing lz4 from peeking into the next page. For backward compatibility, the decoder handles both formats: - New format (flag set): reads compressed size, uses exact LimitedReader - Old format (flag=0): uses LimitedReader workaround with lz4 frame footer size Fixes #70 Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
This is fine, I guess, but I wonder at this point what is the frame format buying you except overhead? |
Response to @ncruces's question
This is a valid point worth discussing. Here's the analysis: What Frame Format Provides (at ~15-27 bytes overhead per page):
What We Actually Need Now:Since we're storing the compressed size prefix, we don't need:
We're really just keeping frame format for the per-page checksum. The Alternative: LZ4 Block FormatBlock format would:
The Tradeoff
For a 4KB page, frame overhead is ~0.5%. For 1KB pages, ~2%. Question for @benbjohnsonIs the per-page LZ4 checksum valuable enough to keep? LTX already has:
If file-level checksums are sufficient, we could switch to block format and eliminate the frame overhead entirely. PR #72's compressed size prefix would make that migration straightforward. Options:
|
|
A page checksum might be useful because the VFS reads single pages. OTOH, uncompressed pages won't have a checksum either (unless you put them inside an uncompressed lz4 frame), so the concern is kinda orthogonal. |
|
I don't think a checksum buys us much. Are we able to drop the LZ4 frame? I assumed it was needed by the LZ4 library when it reads. |
|
Most of these compression algorithms are layered. There's a block compression layer, then a frame layer. The block compression works for smallish data of known size. To de/compress one block, you must know how many bytes go in, and more or less provide enough buffer for how many bytes come out. You can think of it as working on arrays/slices/buffers. The frame layer works on top, can support lots more data, of unknown - a priori - length, and adds headers and trailers with checksums mostly to support streaming. This works by buffering and chunking data, and passing it to the block compression layer. lz4 uses buffers starting at 64K, so compressing single pages independently can be easily achieved with just the block layer. But this is a file format change, so you should make an informed decision, and not go with “random guy on the internet.” |
|
@ncruces The explanation makes sense, thanks. I don't dig into the low level parts of compression libraries but using a single block makes sense given that SQLite pages can't be more than 64KB. |
|
Superseded by PR #73 which uses LZ4 block format instead of frame format |
Summary
PageHeaderFlagCompressedSizeflag to indicate compressed size prefix follows page headerProblem
The lz4 library v4.1.23 added frame concatenation support per the LZ4 spec. When reading an LZ4 frame, the library now peeks ahead after EOF to check for another concatenated frame. This broke LTX because each page is an independent LZ4 frame with a
PageHeaderin between - when lz4 peeks, it reads the next PageHeader bytes, sees an invalid LZ4 signature, and errors.Solution
Add a compressed size prefix to each page, allowing the decoder to create an exact
LimitedReaderthat prevents lz4 from peeking beyond the frame boundary.New format:
Old format (still supported for reading):
The flag is in
PageHeader.Flags(notHeader.Flags) because pages are read individually in the VFS without easy access to the file header.Test plan
Fixes #70
🤖 Generated with Claude Code