Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block splitter #4136

Merged
merged 44 commits into from
Oct 24, 2024
Merged

Block splitter #4136

merged 44 commits into from
Oct 24, 2024

Conversation

Cyan4973
Copy link
Contributor

@Cyan4973 Cyan4973 commented Sep 3, 2024

Instead of ingesting full blocks only (128 KB),
make an a-priori analysis of the data,
and infer a position to split a block at a more appropriate boundary.

This can notably happen in an archive scenario, at the boundary between 2 files of different nature within the archive.

This leads to some non-trivial compression gains, for a correspondingly acceptable speed cost.
The benefit is higher when there isn't already a post-splitter (like in higher btopt levels and above (16+)),
but even when a post-splitter is active, there is still some small compression ratio benefit, making this strategy desirable even for higher compression modes.

However, this input analysis is not free.
The initial variant is currently reserved for higher compression strategies (>= btopt) where it combines well with the post-splitter operating on known sequences.
A second, faster variant, is applied to middle strategies btlazy2 and lazy2, where it improves compressed size by ~300 KB. It achieves its faster speed by sampling the input, instead of scanning it sequentially.

For faster modes, the analysis is skipped, and replaced by a static split size (no longer limited to 128 KB only). Through tests, it appears that a static 92 KB block size brings higher compression ratio, at a small-to-negligible compression speed loss (mostly due to increased nb of blocks, hence of block headers).

Here are some benchmarks, focusing on compression savings:

silesia.tar :

level dev this PR savings note
1 73422432 73324476 -97956 static block size
7 60546773 60488311 -58462 static block size (last)
8 60015264 59700036 -315228 dynamic sampling (first)
15 57176243 56867697 -308546 dynamic sampling (last)
16 55338671 55256977 -81694 full splitting analysis (pre+post)
19 52887208 52842629 -44579 idem
22 52322760 52284324 -38436 idem

calgary.tar :

level dev this PR savings note
1 1143607 1129229 -14378 static block size
7 948081 946941 -1140 static block size (last)
8 940280 932212 -8068 dynamic sampling (first)
15 921606 911627 -9979 dynamic sampling (last)
16 882353 881217 -1136 full splitting analysis (pre+post)
19 861336 859737 -1599 idem
22 861200 859578 -1622 idem

Follow up :

  • Consider multiple variants, with incremental speed / accuracy trade-offs
  • Make the choice of splitting strategy selectable via compression parameter

@Cyan4973
Copy link
Contributor Author

Cyan4973 commented Sep 4, 2024

mmmh,
this could be a problem:

test 45 : in-place decompression : Error : test (margin <= ZSTD_DECOMPRESSION_MARGIN(CNBuffSize, ZSTD_BLOCKSIZE_MAX)) failed

I suspect the problem is that the macro ZSTD_DECOMPRESSION_MARGIN() may make an assumption about block sizes being always full (128 KB), and derive an assumption about maximum expansion, which is no longer respected when block sizes are 92 KB by default (lower compression levels). In contrast, ZSTD_decompressionMargin() scans the compressed data, hence it probably ends up with a different result, and now both results differ.

I'm not completely sure what's the wanted behavior here...

edit: confirmed that, when I change the default block size to anything other than 128 KB, it breaks this test.
On the other hand, ZSTD_DECOMPRESSION_MARGIN() macro requires a parameter blockSize, and the test passes ZSTD_BLOCKSIZE_MAX, aka 128 KB, so it's pretty clear what it's expecting.
So the question is: how could this test be passed "reasonably", i.e. without inner knowledge of how the internal block splitting decision is done ?

@Cyan4973
Copy link
Contributor Author

Cyan4973 commented Oct 17, 2024

Who knew adding a single source file (zstd_preSplit.c) would be such a big problem...

currently blocked trying to get the single-file library builder to work,
it doesn't include the new file, resulting in a link stage error.

and then each and every build system also requires updating its own list of files in its own format and location.

@Cyan4973
Copy link
Contributor Author

Cyan4973 commented Oct 17, 2024

Weird stuff :

error: undefined reference to '__mulodi4'

It only happens during compilation of the clang-asan-ubsan-fuzz32 test, aka undefined sanitizer enabled for 32-bit compilation on clang (same test with gcc compiles fine).

The failure seems to correspond to where * multiplication operations on S64 aka long long variables happen.

And of course, it happens all the time on github CI, but not on any other system I can test the same code and build rule with.

@Cyan4973 Cyan4973 force-pushed the preSplit branch 3 times, most recently from a4c653c to 904fa69 Compare October 20, 2024 23:52
@Cyan4973 Cyan4973 marked this pull request as ready for review October 20, 2024 23:52
@Cyan4973
Copy link
Contributor Author

All tests passed, ready for review

@Cyan4973 Cyan4973 changed the title Experiment : pre-splitter Block splitter Oct 21, 2024
Copy link
Contributor

@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed everything except zstd_preSplit.[hc]

CC=clang CFLAGS="-O3 -m32" FUZZER_FLAGS="--long-tests" make asan-fuzztest

# The following test seems to have issues on github CI specifically,
# it does not provide the `__mulodi4` instruction emulation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it have issues before this PR, or only after? If it is the latter, this might also have issues in the kernel, and we would need to do something similar to what we do with ZSTD_div64()

#define ZSTD_div64(dividend, divisor) ((dividend) / (divisor))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this might be a compiler bug.

Copy link
Contributor Author

@Cyan4973 Cyan4973 Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the issue doesn't reproduce on my local linux workstation.
I suspect a bug in the version of ubsan for clang currently deployed within github ci.

It's likely that this issue will be fixed at some unspecified time in the future.
We just can't wait for this issue to be fixed, the CI needs to continue running.
We'll re-enable this test when it can work properly on Github CI again.

Also, dropping temporarily the ubsan test for clang for 32-bit x86 targets seems like a minor inconvenience, given that:

  • we mostly care about 64-bit x64
  • ubsan 32-bit is still running in CI with gcc
  • we keep the more important asan test for 32-bit on clang

@@ -278,6 +278,8 @@
* Alignment check
*****************************************************************/

#define ZSTD_IS_POWER_2(a) (((a) & ((a)-1)) == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We don't need this for compile time constants like array bounds, could we make this a function ZSTD_isPow2()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* heuristic, tested as being "generally better".
* do not split incompressible data though: respect the 3 bytes per block overhead limit.
*/
return savings ? 92 KB : 128 KB;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it is splitting if the savings is non-zero. So it will split if the savings is negative. Is that correct? If so why do we want to do that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, savings will only be >0 if there were some observed savings (aka, smaller cSize than srcSize) from previous blocks.

It also means that, for the first block, it will never split.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this currently will also split if the savings is negative, which doesn't seem like the desired behavior. Is this what you intended?

Suggested change
return savings ? 92 KB : 128 KB;
return savings > 0 ? 92 KB : 128 KB;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, we need to compute how much overhead this can possibly add, and ensure that this block split can never grow us beyond the compress bound.

E.g.

Suggested change
return savings ? 92 KB : 128 KB;
return savings >= 3 ? 92 KB : 128 KB;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this currently will also split if the savings is negative, which doesn't seem like the desired behavior. Is this what you intended?

That's a very good point.
In my mind, savings was boxed to remain positive,
which it is as long as it remains within compress_frameChunk().

But as soon as we leave that function and return (streaming scenario),
savings is regenerated from actual bytes consumes and written,
which can be negative if data was so far incompressible.

This is fixed in newer version, thanks to the >3 test.

Great catch @terrelln !

lib/compress/zstd_compress.c Show resolved Hide resolved
/* note: conservatively only split full blocks (128 KB) currently,
* and even then only if there is more than 128 KB input remaining.
*/
if (srcSize <= 128 KB || blockSizeMax < 128 KB)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that the block splitter won't work in streaming mode? What if the user passes in exactly 128KB at a time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, if the user passes exactly 128 KB at a time, block splitting will not be triggered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is a good idea. In my model of the streaming API, how you chunk your data doesn't impact the compression, with the exception of the final block. This seems like it could introduce confusing behavior. Can we get rid of this restriction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll look into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All full blocks are now sent to the splitter function, irrespective of the presence of additional data beyond the full block.

@@ -4556,7 +4584,7 @@ static size_t ZSTD_compress_frameChunk(ZSTD_CCtx* cctx,
}
} /* if (ZSTD_useTargetCBlockSize(&cctx->appliedParams))*/


if (cSize < blockSize) savings += (blockSize - cSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to subtract savings if we grow the block?

If we aren't precisely tracking the savings I can see cases where an adversarial pattern could cause compression to fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we can add some malus when a block is not compressed.
This will be a conservative over-estimate, but that's not important, we just want to make sure there is no possible scenario that could expand data by more than 3-bytes per full block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we can add some malus when a block is not compressed.

I don't like this solution because it is a patch that hides the problem. But I don't see how it guarantees that it can never occur. I'd prefer to precisely calculate the savings so we can know exactly if we are allowed to split or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a paragraph which explains why a 1-byte malus to savings for each incompressible block is enough to guarantee that the 3-bytes per 128 KB expansion limit cannot be breached.

* with alignment control
* Note : should happen only once, at workspace first initialization
*/
MEM_STATIC void* ZSTD_cwksp_reserve_object_aligned(ZSTD_cwksp* ws, size_t byteSize, size_t alignment)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need alignment smaller than 64? It seems it might be simpler to just keep 64-byte alignment on all allocations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it could also be done this way.
As align-8 is sufficient, align-64 felt overkill.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Even if this is a desirable change (why?), this seems like it could be done in a different PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new TMP_WORKSPACE requires an 8-bytes alignment.
The original ZSTD_cwksp_reserve_object() that was used here was aligned on sizeof(void*).
It was breaking tests on 32-bit systems.

typedef struct {
int events[HASHTABLESIZE];
S64 nbEvents;
} FingerPrint;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Just one word

Suggested change
} FingerPrint;
} Fingerprint;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


#define CHUNKSIZE (8 << 10)
/* Note: technically, we use CHUNKSIZE, so that's 8 KB */
size_t ZSTD_splitBlock_4k(const void* src, size_t srcSize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we pick a better name, since as noted we're splitting by 8 KB currently? Maybe ZSTD_splitBlock_byFixedChunks()

Copy link
Contributor Author

@Cyan4973 Cyan4973 Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this name is fixed in the next patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and merged the next patch (sample5) in this feature branch,
so that it also updates the naming convention in the process.

initStats(fpstats);
recordFingerprint(&fpstats->pastEvents, p, CHUNKSIZE);
for (pos = CHUNKSIZE; pos < blockSizeMax; pos += CHUNKSIZE) {
assert(pos <= blockSizeMax - CHUNKSIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this holds because blockSizeMax == (128 << 10), but can we fix the looping logic to guarantee this instead?

It would be cleaner to have this function support any blockSizeMax. Given that the checks in outer loop isn't super performance critical, I'd prefer clarity and explicit checks rather than assumptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

} else {
mergeEvents(&fpstats->pastEvents, &fpstats->newEvents);
ZSTD_memset(&fpstats->newEvents, 0, sizeof(fpstats->newEvents));
penalty = penalty - 1 + (penalty == 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Lets rewrite for clarity

Suggested change
penalty = penalty - 1 + (penalty == 0);
if (penalty > 0) {
--penalty;
}

@@ -272,7 +276,7 @@ MEM_STATIC size_t ZSTD_cwksp_bytes_to_align_ptr(void* ptr, const size_t alignByt
* which we can allocate from the end of the workspace.
*/
MEM_STATIC void* ZSTD_cwksp_initialAllocStart(ZSTD_cwksp* ws) {
return (void*)((size_t)ws->workspaceEnd & ~(ZSTD_CWKSP_ALIGNMENT_BYTES-1));
return (void*)((size_t)ws->workspaceEnd & (size_t)~(ZSTD_CWKSP_ALIGNMENT_BYTES-1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this cast is actually correct. You need to cast inside the not expression if your goal is to ensure that the not doesn't happen to a narrower type that then gets zero-extended during promotion to the size_t.

Also should we be casting to intptr_t, not size_t?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also should we be casting to intptr_t, not size_t?

Unfortunately, intptr_t is not guaranteed to exist, while size_t is.

Also, since this is just for alignment checks, we just care about the last bits (realistically, the last 6 bits maximum for cache-line alignment). So we could even use a smaller type. But size_t is good enough already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewrote ZSTD_cwksp_initialAllocStart() for clarity

@@ -137,11 +137,12 @@ ZSTD_CCtx* ZSTD_initStaticCCtx(void* workspace, size_t workspaceSize)
ZSTD_cwksp_move(&cctx->workspace, &ws);
cctx->staticSize = workspaceSize;

/* statically sized space. entropyWorkspace never moves (but prev/next block swap places) */
/* statically sized space. tmpWorkspace never moves (but prev/next block swap places) */
if (!ZSTD_cwksp_check_available(&cctx->workspace, ENTROPY_WORKSPACE_SIZE + 2 * sizeof(ZSTD_compressedBlockState_t))) return NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we basically have to replace every usage of ENTROPY_WORKSPACE_SIZE with TMP_WORKSPACE_SIZE? Including here?

Copy link
Contributor Author

@Cyan4973 Cyan4973 Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not every instance, since in some cases it's really "just" about entropy,
but quite possibly here.
This might be a left over.

edit :
yes, it was a left over, to be fixed.
Great catch @felixhandte !

* with alignment control
* Note : should happen only once, at workspace first initialization
*/
MEM_STATIC void* ZSTD_cwksp_reserve_object_aligned(ZSTD_cwksp* ws, size_t byteSize, size_t alignment)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Even if this is a desirable change (why?), this seems like it could be done in a different PR.

Copy link
Contributor

@felixhandte felixhandte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think you should be able to avoid all of these cwksp changes if you just switch the entropy workspace allocation from being in the objects group (and allocation phase) to being in the aligned group.

This should just mean moving the allocation down into ZSTD_reset_matchState() where we do the other aligned allocations as well as changing:

size_t const entropySpace = ZSTD_cwksp_alloc_size(ENTROPY_WORKSPACE_SIZE);

to use ZSTD_cwksp_aligned_alloc_size().

The ZSTD_SLIPBLOCK_WORKSPACESIZE is 8208, which is not an even divisor of 64. So it does mean we'll grow the workspace to be a multiple of 64 and throw away 48 bytes. We could look at improving this in a follow-up PR.

@Cyan4973
Copy link
Contributor Author

Addressed the last comment by removing the need for 64-bit members in the workspace structure, so that it doesn't need 8-bytes alignment. Now, TMP_WORKSPACE uses the same reserve_object() method and alignment rule (sizeof(void*)) as was previously used by ENTROPY_WORKSPACE, and is therefore just a size change.

Note that signed 64-bit multiplications are still needed for the distance function, so this doesn't address the problem of a bug in the Github CI ubsan version for clang in 32-bit compilation mode, which remains disabled for the time being.

@Cyan4973
Copy link
Contributor Author

All comments addressed

instead of ingesting only full blocks, make an analysis of data, and infer where to split.
for better portability on Linux kernel
though I really wonder if this is a property worth maintaining.
Cyan4973 and others added 19 commits October 23, 2024 11:50
for non 64-bit systems
let's fill the initial stats directly into target fingerprint
that samples 1 in 5 positions.

This variant is fast enough for lazy2 and btlazy2,
but it's less good in combination with post-splitter at higher levels (>= btopt).
…loss

ensure data can never be expanded by more than 3 bytes per full block.
due to integration of `sample5` strategy, leading to better compression ratios on a range of levels
strict C90 compliance test
so that it can be stored using standard alignment requirement (sizeof(void*)).

Distance function still requires 64-bit signed multiplication though,
so it won't change the issue regarding the bug in ubsan for clang 32-bit on github ci.
Copy link
Contributor

@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in ZSTD_compress_frameChunk() doesn't look like it guarantees that it respects ZSTD_compressBound(). It looks like it is very likely to respect the bound. But I'm positive that the fuzzers will eventually find a case where it doesn't.

/* note: conservatively only split full blocks (128 KB) currently,
* and even then only if there is more than 128 KB input remaining.
*/
if (srcSize <= 128 KB || blockSizeMax < 128 KB)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is a good idea. In my model of the streaming API, how you chunk your data doesn't impact the compression, with the exception of the final block. This seems like it could introduce confusing behavior. Can we get rid of this restriction?

* heuristic, tested as being "generally better".
* do not split incompressible data though: respect the 3 bytes per block overhead limit.
*/
return savings ? 92 KB : 128 KB;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this currently will also split if the savings is negative, which doesn't seem like the desired behavior. Is this what you intended?

Suggested change
return savings ? 92 KB : 128 KB;
return savings > 0 ? 92 KB : 128 KB;

* heuristic, tested as being "generally better".
* do not split incompressible data though: respect the 3 bytes per block overhead limit.
*/
return savings ? 92 KB : 128 KB;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, we need to compute how much overhead this can possibly add, and ensure that this block split can never grow us beyond the compress bound.

E.g.

Suggested change
return savings ? 92 KB : 128 KB;
return savings >= 3 ? 92 KB : 128 KB;

@@ -4556,7 +4584,7 @@ static size_t ZSTD_compress_frameChunk(ZSTD_CCtx* cctx,
}
} /* if (ZSTD_useTargetCBlockSize(&cctx->appliedParams))*/


if (cSize < blockSize) savings += (blockSize - cSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we can add some malus when a block is not compressed.

I don't like this solution because it is a patch that hides the problem. But I don't see how it guarantees that it can never occur. I'd prefer to precisely calculate the savings so we can know exactly if we are allowed to split or not.

this helps make the streaming behavior more consistent,
since it does no longer depend on having more data presented on the input.

suggested by @terrelln
@Cyan4973
Copy link
Contributor Author

I updated the blind-split strategy with the recommended updates from the code review.
I'm no longer concerned by a risk of over-expanding incompressible data at this point.

That being said, the blind-split strategy was proposed initially as a way to leverage "something" from the capability to ingest partial blocks for lower levels, for which the initial split heuristic was not fast enough.

With now an execution plan which seems able to offer block splitting all the way down to level 1, it makes the blind-split strategy less important, now essentially reserved to negative compression levels.

Issue is, for negative compression levels, the blind-split strategy isn't effective. This is likely because negative compression levels drop the huffman compression of literals, which is where it had the most impact.

So, if it's not useful anywhere, it's debatable if this split strategy deserves to stay in the code.

An alternative could be to just drop it.

instead of just for blind split.

This is in anticipation of adversarial input,
that would intentionally target the sampling pattern of the split detector.

Note that, even without this protection, splitting can never expand beyond ZSTD_COMPRESSBOUND(),
because this upper limit uses a 1KB block size worst case scenario,
and splitting never creates blocks thath small.

The protection is more to ensure that data is not expanded by more than 3-bytes per 128 KB full block,
which is a much stricter limit.
first block is no longer splitted since adding the @Savings over-split protection
@Cyan4973
Copy link
Contributor Author

Cyan4973 commented Oct 24, 2024

I updated the policy around savings over-split protection variable
to employ it for all splitting strategies,
not just the last "blind-split" strategy.

This is in anticipation of some adversarial input that would take advantage of the sampling pattern of faster split detectors,
thus making it split blocks which are effectively incompressible, thus creating additional block headers.

Note that, even without this protection, the ZSTD_COMPRESSBOUND() limit was never at risk:
this limit is based on a worst-case scenario of 1KB per block,
so even if the splitter is made to split incompressible data via adversarial inputs,
it will never get close to that limit (current minimum block size after splitting is 8 KB).

The problem is more about ZSTD_DECOMPRESSION_MARGIN(),
which uses its own worst size estimation, separate from ZSTD_COMPRESSBOUND(),
which is much more stringent (effectively, expansion there is limited to 3 bytes per full 128 KB block).
ZSTD_DECOMPRESSION_MARGIN() though is a very niche use case (overlapping input and output).

* otherwise only full blocks are used.
* But being conservative is fine,
* since splitting barely compressible blocks is not fruitful anyway */
savings += (S64)blockSize - (S64)cSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this! IMO this method is also more clear, because it is clear that it is tracking exactly the savings from compression.

@Cyan4973 Cyan4973 merged commit 2dddf09 into dev Oct 24, 2024
94 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants