Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zstd compression for sort files #540

Open
jodavies opened this issue Jun 16, 2024 · 6 comments · May be fixed by #541
Open

Zstd compression for sort files #540

jodavies opened this issue Jun 16, 2024 · 6 comments · May be fixed by #541
Labels
enhancement New feature or request

Comments

@jodavies
Copy link
Collaborator

I ran a few tests, using @magv 's suggestion of using https://github.com/facebook/zstd/tree/dev/zlibWrapper to use Zstd while making almost no changes to FORM's code. You need to include the wrapper's header instead of zlib.h, and call a function to enable the use of zstd.

For the test I link against a library as compiled by https://github.com/magv/hepware . We could do this, or alternatively compile zstd in with FORM (by setting it up as a git submodule or by directly including the code in this repository -- it is 3-clause BSD, I think that is allowed).

My first test was a simple program which generates lots of terms in a single module which all cancel in the final sort; no scratch files are created, only sort files. This runs a few percent faster with zstd, and uses a few percent less disk space.

I also ran a mincer test (an N=11 moment of a single diagram for "graviton-exchange DIS" -- this is an old benchmark I have lying around from Andreas). This uses ~12GB of disk (mostly the uncompressed scratch files) but runs 8% faster.

All these tests ran in tmpfs, so disk performance was never a bottleneck. On a slow disk the better compression ratio (say, 5%?) will help as well.

This seems like an easy improvement -- we just need to decide what the best way to include it in the build is. Any thoughts?

@jodavies jodavies added the enhancement New feature or request label Jun 16, 2024
@tueda
Copy link
Collaborator

tueda commented Jun 17, 2024

Perhaps (lib)zstd(-dev) can be installed easily by the distribution's package manager (or possibly via tools like vcpkg). So, we can handle it similarly to gmp/mpfr/zlib.

For zlibWrapper, we could use a submodule, which would then be copied into the tarball distribution.

Example: https://github.com/tueda/form/tree/feat-zstd-1 (but I don't have a suitable benchmark test for this).

@jodavies
Copy link
Collaborator Author

Yes, that looks good. At least for me, using the #define before the header include to enable zstd doesn't work properly. But we can call ZWRAP_useZSTDcompression(1); (or ZWRAP_useZSTDcompression(0);) in the code somewhere. Using this we can just add a sub-key like On compress, zstd;.

I would say that if compiled with zstd support, use of zstd should be the default. But then one can specify On compress, gzip; for easy benchmarking.

Do you want to implement the On/Off statement also, or I can do it and make a PR into your branch?

@tueda
Copy link
Collaborator

tueda commented Jun 17, 2024

Please go ahead. On compress, zstd sounds good. My implementation is experimental, so feel free to use any part of it in your branch if you find it useful.

@jodavies
Copy link
Collaborator Author

jodavies commented Jun 18, 2024

Looking through the code for where I need to add things, I noticed that there is https://www.nikhef.nl/~form/maindir/documentation/reference/online/online.html#SECTION008220000000000000000 . Either this needs to also have the zstd option, or we can remove this form FORM 5.

Presumably "4.3.2" will not have zstd support, right? That one really should be bug-fixes only on top of 4.3.1.

@tueda
Copy link
Collaborator

tueda commented Jun 19, 2024

Either this needs to also have the zstd option, or we can remove this form FORM 5.

The manual doesn't mention Compress gzip, though it works while Compress zstd gives

test.frm Line 1 --> Unknown option: zstd, on, off or gzip expected

Maybe we don't need to touch the Compress statement, which is only for backward compatibility.

If we really want to remove such obsolete commands, perhaps it is good to first introduce deprecated warnings for them.

Presumably "4.3.2" will not have zstd support, right? That one really should be bug-fixes only on top of 4.3.1.

Right. I think it's good to keep the 4.3 branch limited to bug fixes (or at most "behaviour changes" for defect resolution), so the additional support for zstd should be excluded.

@jodavies
Copy link
Collaborator Author

Either this needs to also have the zstd option, or we can remove this form FORM 5.

The manual doesn't mention Compress gzip, though it works while Compress zstd gives

test.frm Line 1 --> Unknown option: zstd, on, off or gzip expected

Maybe we don't need to touch the Compress statement, which is only for backward compatibility.

If we really want to remove such obsolete commands, perhaps it is good to first introduce deprecated warnings for them.

Presumably "4.3.2" will not have zstd support, right? That one really should be bug-fixes only on top of 4.3.1.

Right. I think it's good to keep the 4.3 branch limited to bug fixes (or at most "behaviour changes" for defect resolution), so the additional support for zstd should be excluded.

OK, we can add the zstd support to the Compress statement here also. The error message is a little confusing actually: on first glance I wondered how CoCompress already knew about the possibility of a "zstd" option...

@tueda tueda linked a pull request Jun 19, 2024 that will close this issue
@tueda tueda linked a pull request Jun 19, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants