Quick detection of corruption of 1st blob offset in cluster #1029
+64
−10
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a partial fix for #1025.
Depends on openzim/zim-testing-suite#15
The fix detects only such unreasonable values for the corrupted offset of the 1st blob in a cluster which can be checked without scanning the entire ZIM file.
Assuming that corruption affects only the offset of the first blob in a cluster
If its value decreases but stays properly aligned (i.e. a multiple of 4 for normal clusters and a multiple of 8 for extended clusters) and above the minimal meaningful value then corruption won't be detected and some garbage data will be prepended to the first blob, and one or more blobs toward the end of the cluster will be returned as empty.
If its value increases but stays properly aligned and below the upper limit imposed by the count of articles in the ZIM file and the size of the first blob, corruption may be detected by a different check requiring that blob offsets increase monotonously. However in case if the underlying misinterpreted bits of data slip through that check, part of the first blob will be lost.