-
Notifications
You must be signed in to change notification settings - Fork 12
refactor(lz4): switch from frame format to block format #73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The lz4 library v4.1.23 added frame concatenation support, which peeks ahead after reading a frame to check for another concatenated frame. This broke LTX because each page is an independent LZ4 frame with a PageHeader in between. This change adds a new PageHeaderFlagCompressedSize flag and writes a 4-byte compressed size prefix after each page header. The decoder uses this size to create an exact LimitedReader, preventing lz4 from peeking into the next page. For backward compatibility, the decoder handles both formats: - New format (flag set): reads compressed size, uses exact LimitedReader - Old format (flag=0): uses LimitedReader workaround with lz4 frame footer size Fixes #70 Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace LZ4 frame compression with block compression to eliminate ~15-23 bytes of overhead per page (magic bytes, descriptor, EndMark, content checksum). Changes: - Add PageHeaderFlagUncompressed for incompressible data - Use lz4.Compressor.CompressBlock() in encoder - Use lz4.UncompressBlock() in decoder for new format - Maintain backward compatibility with old frame format - Add validation for uncompressed data size and buffer bounds This addresses feedback from @ncruces and @benbjohnson that the LZ4 frame format adds unnecessary overhead when we already have file-level checksums. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
|
@corylanou Can you add a test to verify that this continues to work for SQLite databases with 64KB sized pages? |
Add tests to verify LZ4 block compression works correctly with 64KB pages (SQLite's maximum page size): - Compressible test: repetitive data that compresses well - Incompressible test: random data stored uncompressed Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
ltx.go
Outdated
| // PageHeaderFlagCompressedSize indicates that a 4-byte compressed size | ||
| // field follows the page header. When set, data uses LZ4 block format | ||
| // (not frame format). | ||
| PageHeaderFlagCompressedSize = uint16(1 << 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be named PageHeaderFlagSize since it exists for both compressed and uncompressed data in the new format.
encoder.go
Outdated
| // Determine what data to write based on compression result. | ||
| var writeData []byte | ||
| if n == 0 || n >= len(data) { | ||
| // Incompressible or compression didn't help - store uncompressed. | ||
| hdr.Flags |= PageHeaderFlagUncompressed | ||
| writeData = data | ||
| } else { | ||
| writeData = enc.compressBuf[:n] | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's worth having a mix of compressed and uncompressed blocks. I can't imagine a time when a SQLite page won't compress except for encrypted pages (but we will handle that without compression in the future when we implement it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the maximum overhead for a block is 273 bytes (0.4%) for 64K pages, 32 bytes (0.8%) for 4K pages, and at most 3.5% on (rather uncommon) even smaller pages:
https://github.com/pierrec/lz4/blob/v4.1.23/internal/lz4block/block.go#L40-L42
Note that, since lz4.CompressBlockBound was used, it is an error if n == 0 is true.
Address PR review feedback: - Rename PageHeaderFlagCompressedSize to PageHeaderFlagSize since it applies to both compressed and uncompressed data - Remove PageHeaderFlagUncompressed and always compress data, as SQLite pages reliably compress and max block overhead is only 0.8% for 4K pages Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
Addressed review feedback in df2a1e2:
All tests pass. |
Summary
PageHeaderFlagCompressedSizetoPageHeaderFlagSize(applies to all block format data)This addresses feedback from @ncruces and @benbjohnson on PR #72:
Changes
ltx.goPageHeaderFlagSizeconstant (renamed fromPageHeaderFlagCompressedSize)encoder.golz4.Compressor.CompressBlock()instead of frame writerdecoder.golz4.UncompressBlock()for new format, keep frame fallback*_test.goFormat Comparison
Backward Compatibility
Review Feedback Addressed
PageHeaderFlagCompressedSize→PageHeaderFlagSize(per @benbjohnson)Test plan
🤖 Generated with Claude Code