Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use const array for unrolled Huffman table #339

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Lukasa
Copy link
Contributor

@Lukasa Lukasa commented Apr 25, 2022

Motivation

We've spent a long time with an unrolled Huffman-table to optimize the
Huffman decoding of HPACK headers. This works well, but because of some
limitations in Swift it's been impossible to embed this structure in the
binary directly: we've always had to alloc-and-copy on startup.
Additionally, due to type checker performance limitations, we've been
Base64 decoding it on startup, which isn't great either.

The introduction of const allows us to embed the knowledge of the fact
that this structure is compile-time constant in to the binary, improving
performance.

Modifications

Revert to the old layout, constify, and remove anything that blocks
that.

Result

Hopefully better performance and less dirty memory.

This is a bit of a WIP. In particular, we have a few issues to resolve. The biggest one is that this still drastically slows down compiles due to the type checker overhead. The type checking here takes 25s on my machine to check this one statement, which is simply not practical for something shipped as source code.

@Lukasa Lukasa force-pushed the cb-investigating-const branch from 2a90318 to 6f76284 Compare April 25, 2022 15:46
Motivation

We've spent a long time with an unrolled Huffman-table to optimize the
Huffman decoding of HPACK headers. This works well, but because of some
limitations in Swift it's been impossible to embed this structure in the
binary directly: we've always had to alloc-and-copy on startup.
Additionally, due to type checker performance limitations, we've been
Base64 decoding it on startup, which isn't great either.

The introduction of const allows us to embed the knowledge of the fact
that this structure is compile-time constant in to the binary, improving
performance.

Modifications

Revert to the old layout, constify, and remove anything that blocks
that.

Result

Hopefully better performance and less dirty memory.
@Lukasa Lukasa force-pushed the cb-investigating-const branch from 6f76284 to 5e4646c Compare April 25, 2022 15:51
@Lukasa
Copy link
Contributor Author

Lukasa commented Apr 25, 2022

@swift-nio-bot test perf please

@swift-server-bot
Copy link

performance report

build id: 15

timestamp: Wed Sep 29 12:55:11 UTC 2021

results

nameminmaxmeanstd
1_conn_10k_reqs 1.6213979721069336 1.6442910432815552 1.632042407989502 0.006699844032143084
encode_100k_header_blocks_indexable 0.14471304416656494 0.14747297763824463 0.14572598934173583 0.0008046368602636831
encode_100k_header_blocks_nonindexable 0.3090909719467163 0.3160330057144165 0.3103432893753052 0.00211925033167702
encode_100k_header_blocks_neverIndexed 0.31516897678375244 0.31856203079223633 0.31606420278549197 0.0011066502001272586
decode_100k_header_blocks_indexable 0.11055600643157959 0.11164200305938721 0.11097080707550049 0.0003699525350451268
decode_100k_header_blocks_nonindexable 0.13858795166015625 0.13942408561706543 0.13902969360351564 0.0002693567257094209
decode_100k_header_blocks_neverIndexed 0.13593900203704834 0.13771402835845947 0.13646239042282104 0.00048347271333613114
hpackheaders_canonical_form 0.031477928161621094 0.032000064849853516 0.03155630826950073 0.00015699760597511416
hpackheaders_canonical_form_trimming_whitespace 0.04422605037689209 0.044754981994628906 0.04440640211105347 0.00023535089672761531
hpackheaders_canonical_form_trimming_whitespace_short_strings 0.04088902473449707 0.04139399528503418 0.04100780487060547 0.00019896661103882608
hpackheaders_canonical_form_trimming_whitespace_long_strings 0.04970097541809082 0.05019199848175049 0.04985220432281494 0.00021980457652302524
huffman_encode_basic 0.7676070928573608 0.7695299386978149 0.7681055784225463 0.0005364138489963227
huffman_encode_complex 0.5194940567016602 0.5202770233154297 0.5198963046073913 0.0002579057159458666
huffman_decode_basic 0.056532979011535645 0.05703103542327881 0.056725692749023435 0.0002025397769603034
huffman_decode_complex 0.04661297798156738 0.04776501655578613 0.046978187561035153 0.0004299907412052386
server_only_10k_requests_1_concurrent 0.3618440628051758 0.36552298069000244 0.3626688003540039 0.001039615633965984
server_only_10k_requests_100_concurrent 0.2846330404281616 0.28559303283691406 0.2850819110870361 0.00035980189089376255
stream_teardown_10k_requests_100_concurrent 0.15219998359680176 0.15268898010253906 0.152490496635437 0.00019064078063663374

comparison

@Lukasa
Copy link
Contributor Author

Lukasa commented Apr 25, 2022

Huh, no comparison there. Weird. I don't expect substantial performance improvements regardless: this only cleans up the dirty memory situation and our benchmark doesn't really test memory pressure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants