Header name to lower case conversion and validation performance optimization #804
+178
−18
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR includes two commits:
HeaderValue: it eliminates an extra heap allocation by usingitoa's stack-allocated buffer instead.WordRegisterfor efficient word-sized byte operations: this introduces chunked processing and validation for lowercase conversion and header name validation. It uses several tricks to reduce instruction count and enable batch processing, validating an entire chunk with just 3 assembly instructions instead of processing byte-by-byte with branching in every loop iteration.Please review the
unsafeparts again, and it would be great to test this on a big-endian CPU as well if one is available.I wrote a benchmark for these changes: https://github.com/fereidani/headernamebench
It’s debatable whether this change actually benefits 32-bit systems; we can disable it by checking the pointer size constant if needed, which skips compilation of optimization for those targets.
Here are my results for this benchmark, showing roughly 50% performance improvement on typical workloads and only a negligible slowdown for very small headers (like
Host) when the optimization does not apply: