Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

createrepo_c zstd compression doesn't fill in the content size, in the frame header. Python API problems. #415

Open
james-antill opened this issue Dec 7, 2023 · 4 comments
Labels
Triaged Someone on the DNF team has read the issue and determined the next steps to take

Comments

@james-antill
Copy link

createrepo_c zstd compression doesn't fill in the content size, in the frame header. This means that you can't call the python API to decompress in the simple/usable way:

data = zstandard.decompress(zstd_data)

...because you'll get an exception:

  File "/usr/lib64/python3.11/site-packages/zstandard/__init__.py", line 210, in decompress
    return dctx.decompress(data, max_output_size=max_output_size)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
zstd.ZstdError: could not determine content size in frame header

...the only way to workaround this is to guess at the output size and pass that random number to the decompress API call.

See the documentation on the python API, esp. the 7th paragraph, here: https://python-zstandard.readthedocs.io/en/latest/decompressor.html#zstandard.ZstdDecompressor.decompress

...a simple testcase would be to generate a compressed file and then call zstandard.decompress() from the std. python API.

@jan-kolarik jan-kolarik added the Triaged Someone on the DNF team has read the issue and determined the next steps to take label Dec 19, 2023
@dralley
Copy link
Contributor

dralley commented Feb 6, 2024

It kind of seems like a python-zstd issue?

sergey-dryabzhinsky/python-zstd#53 (comment)

Decompression fails where no content size is included in the frame (e.g. streaming)

...

(reply)

Yes, this module is simple and dumb. It never meant to support streaming compression. And I'll keep it this way.

createrepo_c uses streaming, so...

@dralley
Copy link
Contributor

dralley commented Feb 6, 2024

Sidenote, I'll plug that it would be great to get zstd support into the Python standard library.

@james-antill
Copy link
Author

If it's too hard to fix createrepo_c then flag this is super hard, backlog, or just close it.

Just kind of annoying when the only python API for that compression is very hard to use correctly.

Does it significantly impact performance if you stream to a file and then compress?

@dralley
Copy link
Contributor

dralley commented Feb 7, 2024

Supposedly this works

with zstd.ZstdDecompressor().stream_reader(io.BytesIO(compressed)) as r:
    decompressed = r.read()
assert decompressed == data

indygreg/python-zstandard#150

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triaged Someone on the DNF team has read the issue and determined the next steps to take
Projects
None yet
Development

No branches or pull requests

3 participants