Skip to content

Conversation

@fereidani
Copy link

This PR includes two commits:

  1. Improves integer conversion performance in HeaderValue: it eliminates an extra heap allocation by using itoa's stack-allocated buffer instead.
  2. Adds WordRegister for efficient word-sized byte operations: this introduces chunked processing and validation for lowercase conversion and header name validation. It uses several tricks to reduce instruction count and enable batch processing, validating an entire chunk with just 3 assembly instructions instead of processing byte-by-byte with branching in every loop iteration.

Please review the unsafe parts again, and it would be great to test this on a big-endian CPU as well if one is available.

I wrote a benchmark for these changes: https://github.com/fereidani/headernamebench

It’s debatable whether this change actually benefits 32-bit systems; we can disable it by checking the pointer size constant if needed, which skips compilation of optimization for those targets.

Here are my results for this benchmark, showing roughly 50% performance improvement on typical workloads and only a negligible slowdown for very small headers (like Host) when the optimization does not apply:

header_to_lower_vs_optimized/header_to_lower_valid
                        time:   [1.3416 µs 1.3446 µs 1.3483 µs]
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
header_to_lower_vs_optimized/header_to_lower_optimized_valid
                        time:   [717.29 ns 718.05 ns 718.81 ns]
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) low severe
  5 (5.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe
header_to_lower_vs_optimized/header_to_lower_invalid
                        time:   [575.18 ns 579.05 ns 584.14 ns]
header_to_lower_vs_optimized/header_to_lower_optimized_invalid
                        time:   [254.98 ns 255.65 ns 256.38 ns]
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe
header_to_lower_vs_optimized/header_to_lower_host
                        time:   [28.722 ns 28.789 ns 28.856 ns]
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  6 (6.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe
header_to_lower_vs_optimized/header_to_lower_optimized_host
                        time:   [29.522 ns 29.600 ns 29.672 ns]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant