fix(decoder): decode multi-member gzip data #3270

lizeyan · 2024-08-16T08:20:12Z

Summary

Allow GZipDecoder to decode multi-member gzip data

Checklist

I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
I've updated the documentation accordingly.

tomchristie · 2024-09-13T12:28:10Z

tests/test_decoders.py

@@ -9,6 +9,7 @@
 import zstandard as zstd

 import httpx
+from httpx._decoders import GZipDecoder


Rather than using a private import, could we switch this to use the same style as the other test cases here.

I changed the style of the test case

tomchristie · 2024-09-13T14:11:49Z

httpx/_decoders.py

        self.decompressor = zlib.decompressobj(zlib.MAX_WBITS | 16)
+        self.state = GzipDecoderState.FIRST_MEMBER


I'm wondering if there's an implementation using something like...

self._buffer = io.BytesIO() self._reader = gzip.GzipFile(fileobj=self._buffer, mode='r')

Would using GzipFile be more robust here / catch other cases?

It seems that zlib is slightly faster than gzip

import gzip import zlib import time import random import string def generate_random_data(size): return ''.join(random.choices(string.ascii_letters + string.digits, k=size)).encode() def test_gzip_decompress(data, iterations): compressed = gzip.compress(data) start_time = time.time() for _ in range(iterations): gzip.decompress(compressed) end_time = time.time() return end_time - start_time def test_zlib_decompress(data, iterations): compressed = zlib.compress(data) start_time = time.time() for _ in range(iterations): zlib.decompress(compressed) end_time = time.time() return end_time - start_time # 测试参数 data_sizes = [1000, 10000, 100000] iterations = 10000 print("Testing gzip and zlib decompression performance:") print("------------------------------------------------") for size in data_sizes: print(f"\nTesting with data size: {size} bytes") data = generate_random_data(size) gzip_time = test_gzip_decompress(data, iterations) zlib_time = test_zlib_decompress(data, iterations) print(f"gzip decompression time: {gzip_time:.4f} seconds") print(f"zlib decompression time: {zlib_time:.4f} seconds") if gzip_time < zlib_time: print(f"gzip is faster by {(zlib_time / gzip_time - 1) * 100:.2f}%") else: print(f"zlib is faster by {(gzip_time / zlib_time - 1) * 100:.2f}%") print("\nTest completed.")

Testing gzip and zlib decompression performance: ------------------------------------------------ Testing with data size: 1000 bytes gzip decompression time: 0.0503 seconds zlib decompression time: 0.0386 seconds zlib is faster by 30.47% Testing with data size: 10000 bytes gzip decompression time: 0.2463 seconds zlib decompression time: 0.2337 seconds zlib is faster by 5.38% Testing with data size: 100000 bytes gzip decompression time: 2.3218 seconds zlib decompression time: 2.3094 seconds zlib is faster by 0.54% Test completed.

tomchristie · 2024-10-08T12:16:31Z

Thanks for your time looking into this.

We'll prefer following Chrome/Safari behavior here... #3269 (reply in thread)

rafalkrupinski · 2024-10-08T12:43:29Z

We'll prefer following Chrome/Safari behavior here.

seriously, why?

tomchristie · 2024-10-08T13:22:40Z

Because introducing complexity to support a use-case that browsers don't support is a trade-off I'd rather not make.

(More broadly speaking... httpx should prefer a design aesthetic of minimal complexity, where possible.)

lizeyan force-pushed the master branch 2 times, most recently from 8cc8437 to d7e2e1b Compare August 23, 2024 06:19

fix(decoder): decode multi-member gzip data

c725219

lizeyan force-pushed the master branch from d7e2e1b to c725219 Compare August 23, 2024 06:24

tomchristie reviewed Sep 13, 2024

View reviewed changes

style(decoder): refactor the test case for multi-member gzip decoding

4376564

tomchristie closed this Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(decoder): decode multi-member gzip data #3270

fix(decoder): decode multi-member gzip data #3270

lizeyan commented Aug 16, 2024

tomchristie Sep 13, 2024

lizeyan Sep 14, 2024

tomchristie Sep 13, 2024

lizeyan Sep 14, 2024

tomchristie commented Oct 8, 2024

rafalkrupinski commented Oct 8, 2024

tomchristie commented Oct 8, 2024

		self.decompressor = zlib.decompressobj(zlib.MAX_WBITS \| 16)
		self.state = GzipDecoderState.FIRST_MEMBER

fix(decoder): decode multi-member gzip data #3270

fix(decoder): decode multi-member gzip data #3270

Conversation

lizeyan commented Aug 16, 2024

Summary

Checklist

tomchristie Sep 13, 2024

Choose a reason for hiding this comment

lizeyan Sep 14, 2024

Choose a reason for hiding this comment

tomchristie Sep 13, 2024

Choose a reason for hiding this comment

lizeyan Sep 14, 2024

Choose a reason for hiding this comment

tomchristie commented Oct 8, 2024

rafalkrupinski commented Oct 8, 2024

tomchristie commented Oct 8, 2024