Add compressed size prefix to page headers for lz4 v4.1.23 compatibility #72

corylanou · 2026-01-11T19:47:33Z

Summary

Adds PageHeaderFlagCompressedSize flag to indicate compressed size prefix follows page header
Encoder now writes 4-byte compressed size after each page header
Decoder handles both old (flag=0) and new formats for backward compatibility
Updates to lz4 v4.1.23

Problem

The lz4 library v4.1.23 added frame concatenation support per the LZ4 spec. When reading an LZ4 frame, the library now peeks ahead after EOF to check for another concatenated frame. This broke LTX because each page is an independent LZ4 frame with a PageHeader in between - when lz4 peeks, it reads the next PageHeader bytes, sees an invalid LZ4 signature, and errors.

Solution

Add a compressed size prefix to each page, allowing the decoder to create an exact LimitedReader that prevents lz4 from peeking beyond the frame boundary.

New format:

[PageHeader:6][CompressedSize:4][LZ4 Frame]

Old format (still supported for reading):

[PageHeader:6][LZ4 Frame]

The flag is in PageHeader.Flags (not Header.Flags) because pages are read individually in the VFS without easy access to the file header.

Test plan

All existing tests pass
Tests pass with lz4 v4.1.23
Backward compatibility: decoder handles both old and new formats

Fixes #70

🤖 Generated with Claude Code

The lz4 library v4.1.23 added frame concatenation support, which peeks ahead after reading a frame to check for another concatenated frame. This broke LTX because each page is an independent LZ4 frame with a PageHeader in between. This change adds a new PageHeaderFlagCompressedSize flag and writes a 4-byte compressed size prefix after each page header. The decoder uses this size to create an exact LimitedReader, preventing lz4 from peeking into the next page. For backward compatibility, the decoder handles both formats: - New format (flag set): reads compressed size, uses exact LimitedReader - Old format (flag=0): uses LimitedReader workaround with lz4 frame footer size Fixes #70 Co-Authored-By: Claude Opus 4.5 <[email protected]>

ncruces · 2026-01-12T09:20:09Z

This is fine, I guess, but I wonder at this point what is the frame format buying you except overhead?

corylanou · 2026-01-12T15:26:31Z

Response to @ncruces's question

"This is fine, I guess, but I wonder at this point what is the frame format buying you except overhead?"

This is a valid point worth discussing. Here's the analysis:

What Frame Format Provides (at ~15-27 bytes overhead per page):

Per-page content checksum (~4 bytes) - validates each page's integrity
Magic/descriptor (~7-15 bytes) - self-describing format
EndMark (4 bytes) - signals end of frame

What We Actually Need Now:

Since we're storing the compressed size prefix, we don't need:

EndMark (we know the exact size)
Magic bytes (we know it's LZ4 from context)

We're really just keeping frame format for the per-page checksum.

The Alternative: LZ4 Block Format

Current:  [PageHeader:6][Size:4][LZ4 Frame with ~15-27 byte overhead]
Block:    [PageHeader:6][Size:4][Raw compressed data]

Block format would:

Save ~15-23 bytes per page
Use simpler API (CompressBlock/UncompressBlock)
Have no frame concatenation issues at all
Lose per-page checksums (rely on file-level checksum instead)

The Tradeoff

Aspect	PR #72 (Frame + Size)	Block Format
Overhead per page	~19-31 bytes	4 bytes
Per-page checksum	Yes (LZ4)	No
Code complexity	Medium	Low

For a 4KB page, frame overhead is ~0.5%. For 1KB pages, ~2%.

Question for @benbjohnson

Is the per-page LZ4 checksum valuable enough to keep? LTX already has:

File checksum (CRC64 of entire file)
Post-apply checksum (rolling checksum of database state)

If file-level checksums are sufficient, we could switch to block format and eliminate the frame overhead entirely. PR #72's compressed size prefix would make that migration straightforward.

Options:

Keep PR Add compressed size prefix to page headers for lz4 v4.1.23 compatibility #72 as-is - Conservative, preserves per-page checksums, can migrate later
Switch to block format now - Cleaner long-term, more code changes, loses per-page checksums

ncruces · 2026-01-12T15:34:02Z

A page checksum might be useful because the VFS reads single pages.

OTOH, uncompressed pages won't have a checksum either (unless you put them inside an uncompressed lz4 frame), so the concern is kinda orthogonal.

benbjohnson · 2026-01-12T18:20:15Z

I don't think a checksum buys us much. Are we able to drop the LZ4 frame? I assumed it was needed by the LZ4 library when it reads.

ncruces · 2026-01-12T20:03:28Z

Most of these compression algorithms are layered. There's a block compression layer, then a frame layer.

The block compression works for smallish data of known size. To de/compress one block, you must know how many bytes go in, and more or less provide enough buffer for how many bytes come out. You can think of it as working on arrays/slices/buffers.

The frame layer works on top, can support lots more data, of unknown - a priori - length, and adds headers and trailers with checksums mostly to support streaming. This works by buffering and chunking data, and passing it to the block compression layer.

lz4 uses buffers starting at 64K, so compressing single pages independently can be easily achieved with just the block layer.

But this is a file format change, so you should make an informed decision, and not go with “random guy on the internet.”

benbjohnson · 2026-01-12T21:40:38Z

@ncruces The explanation makes sense, thanks. I don't dig into the low level parts of compression libraries but using a single block makes sense given that SQLite pages can't be more than 64KB.

corylanou · 2026-01-15T16:07:07Z

Superseded by PR #73 which uses LZ4 block format instead of frame format

corylanou mentioned this pull request Jan 12, 2026

refactor(lz4): switch from frame format to block format #73

Merged

3 tasks

corylanou closed this Jan 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add compressed size prefix to page headers for lz4 v4.1.23 compatibility #72

Add compressed size prefix to page headers for lz4 v4.1.23 compatibility #72

Uh oh!

corylanou commented Jan 11, 2026

Uh oh!

ncruces commented Jan 12, 2026

Uh oh!

corylanou commented Jan 12, 2026

Uh oh!

ncruces commented Jan 12, 2026

Uh oh!

benbjohnson commented Jan 12, 2026

Uh oh!

ncruces commented Jan 12, 2026

Uh oh!

benbjohnson commented Jan 12, 2026

Uh oh!

corylanou commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add compressed size prefix to page headers for lz4 v4.1.23 compatibility #72

Add compressed size prefix to page headers for lz4 v4.1.23 compatibility #72

Uh oh!

Conversation

corylanou commented Jan 11, 2026

Summary

Problem

Solution

Test plan

Uh oh!

ncruces commented Jan 12, 2026

Uh oh!

corylanou commented Jan 12, 2026

Response to @ncruces's question

What Frame Format Provides (at ~15-27 bytes overhead per page):

What We Actually Need Now:

The Alternative: LZ4 Block Format

The Tradeoff

Question for @benbjohnson

Uh oh!

ncruces commented Jan 12, 2026

Uh oh!

benbjohnson commented Jan 12, 2026

Uh oh!

ncruces commented Jan 12, 2026

Uh oh!

benbjohnson commented Jan 12, 2026

Uh oh!

corylanou commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants