Skip to content

Conversation

@veloman-yunkan
Copy link
Collaborator

@veloman-yunkan veloman-yunkan commented Jan 9, 2026

This is a partial fix for #1025.

Depends on openzim/zim-testing-suite#15

The fix detects only such unreasonable values for the corrupted offset of the 1st blob in a cluster which can be checked without scanning the entire ZIM file.

Assuming that corruption affects only the offset of the first blob in a cluster

  • If its value decreases but stays properly aligned (i.e. a multiple of 4 for normal clusters and a multiple of 8 for extended clusters) and above the minimal meaningful value then corruption won't be detected and some garbage data will be prepended to the first blob, and one or more blobs toward the end of the cluster will be returned as empty.

  • If its value increases but stays properly aligned and below the upper limit imposed by the count of articles in the ZIM file and the size of the first blob, corruption may be detected by a different check requiring that blob offsets increase monotonously. However in case if the underlying misinterpreted bits of data slip through that check, part of the first blob will be lost.

The value of the offset of 1st blob in cluster is detected to be
corrupted if it suggests that there are more blobs in this cluster than
there are articles in the ZIM file.

Note that a stronger check could be performed if a tighter upper limit
on the count of blobs in the given cluster was known. In particular, a
precise check would be possible if the count of blobs in the cluster was
available, however the only quick source of that information is the value
that has to be checked itself.
@kelson42 kelson42 force-pushed the detection_of_corruption_of_1st_blob_offset_in_cluster branch from 28f79ad to d2e8f88 Compare January 10, 2026 11:31
@kelson42 kelson42 merged commit 2ad513c into main Jan 10, 2026
16 of 25 checks passed
@kelson42 kelson42 deleted the detection_of_corruption_of_1st_blob_offset_in_cluster branch January 10, 2026 13:16
@veloman-yunkan
Copy link
Collaborator Author

This was merged prematurely. I had tested it only on one of the unit tests most relevant to this change and forgot to run the full test suite before submitting the PR. Now the CI is broken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants