Benchmark compression algorithms with L1 and L2 example data #233

trianglesphere · 2022-01-28T18:39:50Z

trianglesphere
Jan 28, 2022

There is prior discussion here: #10

Corpus Preparation

Pull existing batch transactions
Expose as single tx, by batch, and by all transactions (maybe have sampling as well)

Compression Benchmark tool

Determine compression stats based on the above
Metering Options (proportional of uncompressed, additional compressed, etc)
Test planned size limitation method
Benchmark compression / decompression speeds
Histogram of bytes for each field in order to manually populate dictionaries (zlib style)

Compression Algorithms

Zlib / deflate
brotli (MIT License)
zstd (BSD/GPLV2- no longer patent encumbered)
Snappy
zero-length encoding
Custom dictionairy

The go standard library provides

bzip2 (decompression only)
flate (deflate)
gzip
lzw
Zlib

trianglesphere · 2022-01-31T21:25:00Z

trianglesphere
Jan 31, 2022
Author

Preliminary benchmarking from 22k appendSequenceBatch transactions.

LZW from the standard library: Not as good a Zlib with the default option. Ignored for further testing
When manually benching zstd, higher compression levels don't appear to help much, but do slow down a lot.
Zlib best compression is also slightly faster
I have not yet tested dictionaries with zstd/zlib. It appears that they may have positive impact (conservative guess is 10-20% range).
I tested across all transactions to see how much a dictionary would help. Over batch means that all of the user submitted transactions in a batch are compressed.
The uncompressed size is only the end user transactions, not anything else that is submitted (like ShouldStart or TotalAppended or Contexts)

Method                                  Size (bytes)          Compression Factor
Uncompressed size                       747471754                 1.0x
Zlib (default). Over all txs            287055775                 2.6x
Zlib (default). Over batches            299967775                 2.5x
Zstd (3). Over all txs                  268508640                 2.8x
Zstd (3). Over batches                  299898133                 2.5x
Zero byte. Over batches                 443788078                 1.7x
Brotli (6). Over all txs                257201216                 2.9x
Brotli (6). Over batches                293052134                 2.55x

0 replies

trianglesphere · 2022-01-31T23:42:06Z

trianglesphere
Jan 31, 2022
Author

More bench results

Zlib: default compression
LZW, 8, lsb
zstd: 3
brotli: 6
The Zlib dictionary is the top 20 address/data items
The zstd dictionary is created with zstd --train. Dict2 is trained on a subset (2.5k) of batches. Dict1 (or Dict) is trained on all 22k batches. Note that the dict for each is a full 122k bytes.

Raw Size	747471754	1.00x
zlib, total	287055775	2.60x
zlib, batches	299967775	2.49x
zlib, per txn	478159161	1.56x
zstd, total	268508640	2.78x
zstd, batches	299898133	2.49x
zstd, per txn	469950532	1.59x
LZW, total	465689375	1.61x
LZW, batches	465198338	1.61x
LZW, per txn	510482997	1.46x
Brotli, total	257201216	2.91x
Brotli, batches	293052134	2.55x
Brotli, per txn	491707851	1.52x
ZLE, total	443788078	1.68x
ZLE, batches	443788078	1.68x
ZLE, per txn	443788078	1.68x
zlib dict, total	287055956	2.60x
zlib dict, batches	296083645	2.52x
zlib dict, per txn	443494645	1.69x
zstd dict, total	270482896	2.76x
zstd dict, batches	278734999	2.68x
zstd dict, per txn	362358697	2.06x
zstd dict2, total	270482488	2.76x
zstd dict2, batches	278905351	2.68x
zstd dict2, per txn	361082201	2.07x

0 replies

trianglesphere · 2022-02-01T01:06:33Z

trianglesphere
Feb 1, 2022
Author

My summary of the compression benchmarking is as follows:
zstd or zlib look like the best bets. Zstd no longer is patent encumbered. There are a couple golang implementations for it, but they are often based upon the original c implementation (something I worry about with MIPS). I believe there is a pure golang zstd implementation as well though, however I don't know how up to date it is.

zlib is pure go which should make compiling to MIPS better.

The other contender is Brottli - I believe the option that I selected as the default is tuned for a higher compression ratio + it appears to be much slower (however it's running a golang implementation b/c I could not get the cgo one to work).

Using a dictionary seems to have a small benefit (and there's a benefit even with a small dictionary), but it's not quite as large as I expected.

The remaining questions are about compression/decompression speed and how to meter gas (I believe it will be hard to measure the effect that each transaction has on the total size of the compressed size).

0 replies

norswap · 2022-02-02T14:48:08Z

norswap
Feb 2, 2022

There are a couple golang implementations for it, but they are often based upon the original c implementation (something I worry about with MIPS).

What is your worry here? That the implementations are not actually correct with respect to the Go semantics, but work because they're compiled to x64, but not in MIPS? Or compiler bug giving slightly results?

I believe it will be hard to measure the effect that each transaction has on the total size of the compressed size

Very true. But what about trying to compress in the current state and reporting that as the cost? The pitfall here is that if their block land in a further batch, the cost might be higher than the reported one (but that's a general risk even with basefee and such).

Another remark/question, you've trained the algorithsm on subset (2.5k) and full (22k) corpuses. But training on the whole corpus then running the compression on the same corpus will produce optimal overfitted outcome. What about training it on half the corpus then trying the compression on the other half? That gives you a ~10k corpus and no overfitting.

0 replies

trianglesphere · 2022-02-02T19:49:28Z

trianglesphere
Feb 2, 2022
Author

What is your worry here? That the implementations are not actually correct with respect to the Go semantics, but work because they're compiled to x64, but not in MIPS? Or compiler bug giving slightly results?

It's just the effort of cross compiling + more a concern of platform dependent code (zstd actually has a test suite to make sure that the result is the same across a bunch of platforms - including mips).

Very true. But what about trying to compress in the current state and reporting that as the cost? The pitfall here is that if their block land in a further batch, the cost might be higher than the reported one (but that's a general risk even with basefee and such).

It depends on the API of the compression algorithm, but generally flushing the in flight data multiple times rather than waiting to flush until the end reduces the compression efficacy (as it operates over the full data). I have some ideas on how to estimate the impact, but nothing that works for online processing.

Another remark/question, you've trained the algorithsm on subset (2.5k) and full (22k) corpuses. But training on the whole corpus then running the compression on the same corpus will produce optimal overfitted outcome. What about training it on half the corpus then trying the compression on the other half? That gives you a ~10k corpus and no overfitting.

Good idea. One note is that the subset actually did better than the full corpus, but figuring out how to do this better is something worth doing - I was going for something quick and dirty to understand the ballpark benefit. One note is that zstd recommends 100s-1000s of files to train the dictionary on.

0 replies

trianglesphere · 2022-02-03T18:03:21Z

trianglesphere
Feb 3, 2022
Author

Copied from other thread

zlib, total                     299967775       2.49x
zlib, batches/flush             327750890       2.28x
zlib dict, total                296083645       2.52x
zlib dict, batches/flush        323635975       2.31x
zstd, total                     299898133       2.49x
zstd, batches/flush             326573794       2.29x
zstd dict2, total               278905351       2.68x   
zstd dict2, batches/flush       306103201       2.44x

0 replies

norswap · 2022-02-04T00:39:34Z

norswap
Feb 4, 2022

One note is that the subset actually did better than the full corpus

Wow, that's interesting!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark compression algorithms with L1 and L2 example data #233

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Benchmark compression algorithms with L1 and L2 example data #233

trianglesphere Jan 28, 2022

Replies: 7 comments

trianglesphere Jan 31, 2022 Author

trianglesphere Jan 31, 2022 Author

trianglesphere Feb 1, 2022 Author

norswap Feb 2, 2022

trianglesphere Feb 2, 2022 Author

trianglesphere Feb 3, 2022 Author

norswap Feb 4, 2022

trianglesphere
Jan 28, 2022

trianglesphere
Jan 31, 2022
Author

trianglesphere
Jan 31, 2022
Author

trianglesphere
Feb 1, 2022
Author

norswap
Feb 2, 2022

trianglesphere
Feb 2, 2022
Author

trianglesphere
Feb 3, 2022
Author

norswap
Feb 4, 2022