Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for zopfli gzip archives #80

Open
jcpunk opened this issue Sep 5, 2017 · 9 comments
Open

support for zopfli gzip archives #80

jcpunk opened this issue Sep 5, 2017 · 9 comments
Labels
Priority: LOW RFE Request For Enhancement (as opposed to a bug)

Comments

@jcpunk
Copy link

jcpunk commented Sep 5, 2017

repodata is an ideal candidate for zopfli as it is compressed once and mirrored a lot. The compressed files are stream compatible with gzip and end users should only notice the files are themselves smaller.

Can an option be added to createrepo_c to utilize zopfli instead of gzip for .gz files.

https://github.com/google/zopfli
https://koji.fedoraproject.org/koji/packageinfo?packageID=16157

@dralley
Copy link
Contributor

dralley commented Jul 25, 2022

I think switching to zlib-ng (which is much faster, thus allowing you to use a more aggressive profile and still get better performance) or an alternate format such as zstd would be far superior options, at this point.

Those libraries would also provide some decompression speed benefits as well. Zopfli only helps with compression.

@jcpunk
Copy link
Author

jcpunk commented Jul 25, 2022

That works for me

@dralley
Copy link
Contributor

dralley commented Jan 11, 2023

Worth closing in favor of #82 ?

@jcpunk
Copy link
Author

jcpunk commented Jan 11, 2023

The major advantage of zopfli is that it has a gzip compatible output stream. Exiting tooling can benefit from the enhanced compression without changes. The zstd format is a great one, but requires client side tooling updates.

@dralley
Copy link
Contributor

dralley commented Jan 11, 2023

I'm told that DNF has supported it for a few years (since roughly EL 8.2 & Fedora 30) which means the only gap is on the repo-metadata-generation side. Of course EL7 won't have that option, but looking forward it's not as much of a concern - and in any case EL7 prefers sqlite metadata if available which uses BZ2 compression by default.

@dralley
Copy link
Contributor

dralley commented Oct 3, 2024

Now that createrepo_c supports zstd, IMO this probably isn't worth it.

@ppisar
Copy link
Contributor

ppisar commented Oct 4, 2024

Basically the only function provided by zopfli is:

/*
Compresses according to the given output format and appends the result to the
output.

options: global program options
output_type: the output format to use
out: pointer to the dynamic output array to which the result is appended. Must
  be freed after use
outsize: pointer to the dynamic output array size
*/
void ZopfliCompress(const ZopfliOptions* options, ZopfliFormat output_type,
                    const unsigned char* in, size_t insize,
                    unsigned char** out, size_t* outsize);

Provided the function supports stream-oriented operations, it shouldn't be a problem to use it from createrepo_c.

However, I did not study the implementation, from the comment, it's not clear how it reports memory allocation failures. Also a glance at a queue of open issues https://github.com/google/zopfli/issues is not a sign of a safe project.

Regarding the compression ratio, I haven't seen any results. Today Fedora 42 primary.xml:

$ ls -l primary*
-rw-r--r--. 1 test test 169877809 Oct  4 13:00 primary
-rw-r--r--. 1 test test  18333236 Oct  4 13:01 primary.gzip
-rw-r--r--. 1 test test  17485321 Oct  4 13:10 primary.zopfli
-rw-r--r--. 1 test test  14950967 Oct  4 13:00 primary.zst

primary.zst taken was taken from the repository, primary.gzip compressed with "-9" option, primary.zopfli with default iterations (--15 according to help output). The zopfli compression took an enormous time, so nobody will probably use more iterations.

With this settings, zopfli saves 5 % of gzip.

I don't think implementing zopfli backend for gzip compression is worth of it. Pat, do you have better comparison?

@ppisar ppisar added Priority: LOW RFE Request For Enhancement (as opposed to a bug) labels Oct 4, 2024
@jcpunk
Copy link
Author

jcpunk commented Oct 4, 2024

You're seeing similar results to what I'm getting.

For Scientific Linux's frozen content (OS not updates) I typically used -i1000 and saw a significant time running, for around 10% reduction in size.

For things that are write once and update never, the savings can be worth the trade off.

That being said, if zst is supported through the tool chain, it has better compression.

@ppisar
Copy link
Contributor

ppisar commented Oct 4, 2024

10 per cent is interesting number. Though I worry it wouldn't be applicable to createrepo_c with the current architecture. Current createrepo_c sends the data for compression in, probably small, chunks. Good compression needs scanning all data at once. One would probably need to change crearerepo_c's architecture of saving files. But then people with large repositories and not so large internal memory could object that createrepo_c cannot run. So the new architecture would need to support both approaches. That's getting awfully complicated only to improve storage space for end-of-life systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: LOW RFE Request For Enhancement (as opposed to a bug)
Projects
None yet
Development

No branches or pull requests

3 participants