-
-
Notifications
You must be signed in to change notification settings - Fork 840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(decoder): decode multi-member gzip data #3270
Conversation
8cc8437
to
d7e2e1b
Compare
tests/test_decoders.py
Outdated
@@ -9,6 +9,7 @@ | |||
import zstandard as zstd | |||
|
|||
import httpx | |||
from httpx._decoders import GZipDecoder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than using a private import, could we switch this to use the same style as the other test cases here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the style of the test case
self.decompressor = zlib.decompressobj(zlib.MAX_WBITS | 16) | ||
self.state = GzipDecoderState.FIRST_MEMBER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if there's an implementation using something like...
self._buffer = io.BytesIO()
self._reader = gzip.GzipFile(fileobj=self._buffer, mode='r')
Would using GzipFile
be more robust here / catch other cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that zlib is slightly faster than gzip
import gzip
import zlib
import time
import random
import string
def generate_random_data(size):
return ''.join(random.choices(string.ascii_letters + string.digits, k=size)).encode()
def test_gzip_decompress(data, iterations):
compressed = gzip.compress(data)
start_time = time.time()
for _ in range(iterations):
gzip.decompress(compressed)
end_time = time.time()
return end_time - start_time
def test_zlib_decompress(data, iterations):
compressed = zlib.compress(data)
start_time = time.time()
for _ in range(iterations):
zlib.decompress(compressed)
end_time = time.time()
return end_time - start_time
# 测试参数
data_sizes = [1000, 10000, 100000]
iterations = 10000
print("Testing gzip and zlib decompression performance:")
print("------------------------------------------------")
for size in data_sizes:
print(f"\nTesting with data size: {size} bytes")
data = generate_random_data(size)
gzip_time = test_gzip_decompress(data, iterations)
zlib_time = test_zlib_decompress(data, iterations)
print(f"gzip decompression time: {gzip_time:.4f} seconds")
print(f"zlib decompression time: {zlib_time:.4f} seconds")
if gzip_time < zlib_time:
print(f"gzip is faster by {(zlib_time / gzip_time - 1) * 100:.2f}%")
else:
print(f"zlib is faster by {(gzip_time / zlib_time - 1) * 100:.2f}%")
print("\nTest completed.")
Testing gzip and zlib decompression performance:
------------------------------------------------
Testing with data size: 1000 bytes
gzip decompression time: 0.0503 seconds
zlib decompression time: 0.0386 seconds
zlib is faster by 30.47%
Testing with data size: 10000 bytes
gzip decompression time: 0.2463 seconds
zlib decompression time: 0.2337 seconds
zlib is faster by 5.38%
Testing with data size: 100000 bytes
gzip decompression time: 2.3218 seconds
zlib decompression time: 2.3094 seconds
zlib is faster by 0.54%
Test completed.
Thanks for your time looking into this. We'll prefer following Chrome/Safari behavior here... #3269 (reply in thread) |
seriously, why? |
Because introducing complexity to support a use-case that browsers don't support is a trade-off I'd rather not make. (More broadly speaking... |
Summary
#3269
Allow GZipDecoder to decode multi-member gzip data
Checklist