Skip to content

Conversation

@corylanou
Copy link
Collaborator

@corylanou corylanou commented Jan 12, 2026

Summary

  • Switch from LZ4 frame format to block format, eliminating ~15-23 bytes overhead per page
  • Rename PageHeaderFlagCompressedSize to PageHeaderFlagSize (applies to all block format data)
  • Always compress data (removed uncompressed path per review feedback)
  • Maintain backward compatibility with old frame format files

This addresses feedback from @ncruces and @benbjohnson on PR #72:

"I don't think a checksum buys us much. Are we able to drop the LZ4 frame?"

Changes

File Description
ltx.go Add PageHeaderFlagSize constant (renamed from PageHeaderFlagCompressedSize)
encoder.go Use lz4.Compressor.CompressBlock() instead of frame writer
decoder.go Use lz4.UncompressBlock() for new format, keep frame fallback
*_test.go Update expected page sizes and flag references

Format Comparison

Old (frame):  [PageHeader:6][LZ4 Frame with ~15-27 byte overhead]
PR #72:       [PageHeader:6][Size:4][LZ4 Frame]
New (block):  [PageHeader:6][Size:4][LZ4 Block Data]

Backward Compatibility

Format Flags Handling
Old frame (no flag) 0x0000 LimitedReader + frame reader
New block format 0x0001 Read size, UncompressBlock

Review Feedback Addressed

  • ✅ Renamed PageHeaderFlagCompressedSizePageHeaderFlagSize (per @benbjohnson)
  • ✅ Removed uncompressed storage path - always compress (per @benbjohnson and @ncruces)
    • Max LZ4 block overhead is only 0.8% for 4K pages

Test plan

  • All existing tests pass
  • Decoder handles both old frame format and new block format
  • 64KB page size tests for both compressible and incompressible data

🤖 Generated with Claude Code

corylanou and others added 2 commits January 11, 2026 13:47
The lz4 library v4.1.23 added frame concatenation support, which peeks ahead
after reading a frame to check for another concatenated frame. This broke LTX
because each page is an independent LZ4 frame with a PageHeader in between.

This change adds a new PageHeaderFlagCompressedSize flag and writes a 4-byte
compressed size prefix after each page header. The decoder uses this size to
create an exact LimitedReader, preventing lz4 from peeking into the next page.

For backward compatibility, the decoder handles both formats:
- New format (flag set): reads compressed size, uses exact LimitedReader
- Old format (flag=0): uses LimitedReader workaround with lz4 frame footer size

Fixes #70

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace LZ4 frame compression with block compression to eliminate
~15-23 bytes of overhead per page (magic bytes, descriptor, EndMark,
content checksum).

Changes:
- Add PageHeaderFlagUncompressed for incompressible data
- Use lz4.Compressor.CompressBlock() in encoder
- Use lz4.UncompressBlock() in decoder for new format
- Maintain backward compatibility with old frame format
- Add validation for uncompressed data size and buffer bounds

This addresses feedback from @ncruces and @benbjohnson that the LZ4
frame format adds unnecessary overhead when we already have file-level
checksums.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@benbjohnson
Copy link
Collaborator

@corylanou Can you add a test to verify that this continues to work for SQLite databases with 64KB sized pages?

Add tests to verify LZ4 block compression works correctly with 64KB
pages (SQLite's maximum page size):
- Compressible test: repetitive data that compresses well
- Incompressible test: random data stored uncompressed

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
ltx.go Outdated
Comment on lines 410 to 413
// PageHeaderFlagCompressedSize indicates that a 4-byte compressed size
// field follows the page header. When set, data uses LZ4 block format
// (not frame format).
PageHeaderFlagCompressedSize = uint16(1 << 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be named PageHeaderFlagSize since it exists for both compressed and uncompressed data in the new format.

encoder.go Outdated
Comment on lines 243 to 251
// Determine what data to write based on compression result.
var writeData []byte
if n == 0 || n >= len(data) {
// Incompressible or compression didn't help - store uncompressed.
hdr.Flags |= PageHeaderFlagUncompressed
writeData = data
} else {
writeData = enc.compressBuf[:n]
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's worth having a mix of compressed and uncompressed blocks. I can't imagine a time when a SQLite page won't compress except for encrypted pages (but we will handle that without compression in the future when we implement it).

Copy link

@ncruces ncruces Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the maximum overhead for a block is 273 bytes (0.4%) for 64K pages, 32 bytes (0.8%) for 4K pages, and at most 3.5% on (rather uncommon) even smaller pages:
https://github.com/pierrec/lz4/blob/v4.1.23/internal/lz4block/block.go#L40-L42

Note that, since lz4.CompressBlockBound was used, it is an error if n == 0 is true.

Address PR review feedback:
- Rename PageHeaderFlagCompressedSize to PageHeaderFlagSize since it
  applies to both compressed and uncompressed data
- Remove PageHeaderFlagUncompressed and always compress data, as SQLite
  pages reliably compress and max block overhead is only 0.8% for 4K pages

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@corylanou
Copy link
Collaborator Author

Addressed review feedback in df2a1e2:

  • Renamed PageHeaderFlagCompressedSizePageHeaderFlagSize
  • Removed PageHeaderFlagUncompressed and the uncompressed code path - now always compresses

All tests pass.

@corylanou corylanou merged commit d017048 into main Jan 15, 2026
2 checks passed
@corylanou corylanou deleted the lz4-block-format branch January 15, 2026 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants