-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In Intern::new compute the hash before acquiring the mutex #28
base: master
Are you sure you want to change the base?
Conversation
So there's smaller lock contention. This also unlocks an option to do partitioning by hash, which should reduce contention greatly.
Just a check: can you confirm that you've run benchmarks and this makes a difference? How big a difference does it make, and under what kind of workload? |
No I didn't do benchmark, sorry. It is not trivial to do testing with unreleased versions in our repo setup. But there are more reasons to do this change. Another one is this: current implementation relies on
from
because it is not possible to borrow |
I reimplemented the library in our project with two PRs I submitted and more changes. This is the implementation. https://gist.github.com/stepancheg/9d2ebac23b27ff8d21a1bcf494b5ac7c The speedup in our program (not just internment) is about 1% (edit). However, the speedup is against version 0.4.0. |
I'm wondering why https://docs.rs/hashbrown/0.11.2/hashbrown/hash_map/struct.HashMap.html#method.raw_entry_mut wouldn't have worked for this purpose? I don't like that the raw API describes itself as "unsafe and experimental". It seems like raw_entry_mut would give the advantages you describe. Am I missing something? |
|
Okay I'm now thinking that this implementation could open users of internment up to hash collision attacks, since it uses a hash with DefaultHasher rather than the hasher for the HashMap, which means the RandomState is predictable and an attached who can generate values that are interned could generate many values with collisions. There are various use cases for internment that I could imagine where this attack path might be open. On the other hand, one might point out that since Intern leaks memory, it's not safe to let attackers generate unlimited Intern values in any case... Not sure if this matters, but I'm reluctant to sidestep the protections built into the std HashMap. |
I've confirmed from the docs that DefaultHasher::new() generates a SipHasher with zero for the keys, which means we would be vulnerable to DOS attacks. Probably your shouldn't let potential attackers provide data for internment anyways, but I'm also not comfortable forgoing the standard protection provided by the standard library. I'll think about how to store a RandomState outside a Mutex so we can compute hashes securely before taking the lock. |
|
Probably, and that's probably how I'd do this. If you want to give it a
shot it might happen sooner.
David Roundy
…On Wed, Nov 24, 2021, 10:20 PM Stepan Koltsov ***@***.***> wrote:
I'll think about how to store a RandomState outside a Mutex
static RANDOM_STATE: OnceCell<RandomState> = ... should not have
noticeable performance overhead, no?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#28 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABBSKPMG27D62EUC7ER4TLUNXISNANCNFSM5IA542FA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
BTW, the reason I'm picky about optimization changes is that I've been bitten in the past by "optimizations" which ended up introducing serious bugs (only discovered months later after they had corrupted many people's data), without even a demonstrated benefit. I doubt that's the case here (among other things, internment is unlikely to affect anyone's on-disk format), but want to ensure that I understand the trade-offs, and that we don't sacrifice my ability to fix any bugs either in this pull request, or that arise in the future. |
So there's smaller lock contention.
This also unlocks an option to do partitioning by hash, which should
reduce contention greatly.