Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safeguard against link overflows in ConcurrentHashMap #2107

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

instw
Copy link

@instw instw commented Nov 30, 2023

Summary:
PROBLEM
Folly ConcurrentHashMaps use Hazard pointers to ensure map entries that were recently removed (using erase, insert_or_assign, etc) aren't cleaned up when there are readers for those objects. Instead, they are removed as part of a reclamation process which typically happens asynchronously. Moreover within ConcurrentHashMap, entries are linked to one another, and this linkage needs to be known within the hazard pointer logic to ensure we don't clean up an object that itself doesn't have any direct hazard pointers, but is referenced by another object that might have hazard pointers. That logic is within HazptrObjLinked.
Under high contention situations (see #2097 ) , the link counting logic can overflow, because a single object has too many dangling links. For example, consider a map that has 2 entries with the same hash code- (A,0) and (B,0). Let's assume that A is stored before B internally within the ConcurrentHashMap's BucketTable. B stores initially that it has a 1 link (to A). Now, let's assume that we replace (A,0) with (A,1). While (A,0) is erased out of the ConcurrentHashMap, its not immediately reclaimed/deleted. During this interim, B has a link count of 2 to the 2 entries of A. This link count is stored as a 16 bit unsigned integer. If the above operation happens very quickly, then we end up in a situation where B's link count overflows past 65535, and wraps around.
This situation is caught in debug compilation (due to DCHECK), but in opt builds, it results in bad retirements. For eg, if B's link count goes past 65535 to 65537 (i.e. 1), then when 1 object of A is reclaimed, the B's link count would decrement past 1 back to 0, causing B to be incorrectly retired. Now if we actually end up removing all of A, the link count will overflow backwards, from 0 back to 65535 and then back to 0, causing a double retirement - a sign to corruption.

SOLUTION
While the situation is rare, it can arise for skewed data with a lot of contention. There are 3 options to "solve" this:

  1. Increase the link count data structure size from 16bit to something higher - Simple, but a work-around. Eventually high-enough contention would bugs to show up there as well.
  2. Crash the process when there is very high contention - Maintains the current performance guarantees, and when ConcurrentHashMap cannot meet those guarantees, it causes a fatal error.
  3. Slow ConcurrentHashMap erasures under high contention (this diff) - Very high contention would cause ConcurrentHashMap to slow down, and give reclamation time to act. Functionally ConcurrentHashMap remains the same, but does exhibit different perf characteristics.

In this change, the HazptrObjLinked code is changed is disallow for overflows since it leads to corruption, and the callers are responsible for handling cases where links cannot be created. For ConcurrentHashMap, we keep waiting, until we can acquire a link : which means erasures under high contention are lock-free but not wait-free.
For reclamation, there are buffers within the cohort to store both retired objects (aka list) and reclaimed objects (aka safe list). In cases where ConcurrentHashMap is unable to acquire a link, it's imperative it tries to initiate a reclamation cycle to make progress, and thus I added a cleanup() method within the cohort to flush any existing retired objects to the hazard pointer domain for retirement-evaluation, kick off a reclamation cycle, and also retire any retired objects pending within the cohort.

Differential Revision: D51647789

Summary:
PROBLEM
During reclamation of tagged objects, we obtain a lock within the hazard pointer domain's tagged list. This list supports always pushing objects, but only allows popping when there is a lock (for `tagged_`).
There is opportunity to improve the performance of reclamation, by keeping the `tagged_` list locked for a smaller amount of time.

SOLUTION
The list is locked to prevent multiple threads from reclaiming the same objects. During reclamation, we take all objects that are potentially cleaned up, and divide them into 2 parts: Those that must be reclaimed, and those that must not. The ones that are not are pushed back to the `tagged_` list, and the list is unlocked. The objects to be reclaimed are freed - by pushing to the cohort.
The locking of list is to guard against the `tagged_` list itself, and hence when we push the not-to-be-reclaimed elements back, we could theoretically release the lock without waiting for the reclaimed objects to finish being processed.
Also validated that the `nomatch` handling is thread-safe - The cohort's `push_safe_obj` uses as CAS loop to safely add to the top of the list.

Differential Revision: D51638674
Summary:
PROBLEM
Folly ConcurrentHashMaps use Hazard pointers to ensure map entries that were recently removed (using `erase`, `insert_or_assign`, etc) aren't cleaned up when there are readers for those objects. Instead, they are removed as part of a reclamation process which typically happens asynchronously. Moreover within ConcurrentHashMap, entries are linked to one another, and this linkage needs to be known within the hazard pointer logic to ensure we don't clean up an object that itself doesn't have any direct hazard pointers, but is referenced by another object that might have hazard pointers. That logic is within `HazptrObjLinked`.
Under high contention situations (see facebook#2097 ) , the link counting logic can overflow, because a single object has too many dangling links. For example, consider a map that has 2 entries with the same hash code- `(A,0)` and `(B,0)`. Let's assume that `A` is stored before `B` internally within the `ConcurrentHashMap`'s `BucketTable`. `B` stores initially that it has a 1 link (to `A`). Now, let's assume that we replace `(A,0)` with `(A,1)`. While `(A,0)` is erased out of the `ConcurrentHashMap`, its not immediately reclaimed/deleted. During this interim, `B` has a link count of 2 to the 2 entries of `A`. This link count is stored as a 16 bit unsigned integer. If the above operation happens very quickly, then we end up in a situation where `B`'s link count overflows past 65535, and wraps around.
This situation is caught in debug compilation (due to `DCHECK`), but in opt builds, it results in bad retirements. For eg, if `B`'s link count goes past 65535 to 65537 (i.e. `1`), then when 1 object of `A` is reclaimed, the `B`'s link count would decrement past `1` back to `0`, causing `B` to be incorrectly retired. Now if we actually end up removing all of `A`, the link count will overflow backwards, from `0` back to `65535` and then back to `0`, causing a double retirement - a sign to corruption.

SOLUTION
While the situation is rare, it can arise for skewed data with a lot of contention. There are 3 options to "solve" this:
1. Increase the link count data structure size from 16bit to something higher - Simple, but a work-around. Eventually high-enough contention would bugs to show up there as well.
2. Crash the process when there is very high contention - Maintains the current performance guarantees, and when ConcurrentHashMap cannot meet those guarantees, it causes a fatal error.
3. Slow ConcurrentHashMap erasures under high contention (this diff) - Very high contention would cause ConcurrentHashMap to slow down, and give reclamation time to act. Functionally `ConcurrentHashMap` remains the same, but does exhibit different perf characteristics.

In this change, the `HazptrObjLinked` code is changed is disallow for overflows since it leads to corruption, and the callers are responsible for handling cases where links cannot be created. For `ConcurrentHashMap`, we keep waiting, until we can acquire a link : which means erasures under high contention are lock-free but not wait-free.
For reclamation, there are buffers within the cohort to store both retired objects (aka `list`) and reclaimed objects (aka `safe list`). In cases where `ConcurrentHashMap` is unable to acquire a link, it's imperative it tries to initiate a reclamation cycle to make progress, and thus I added a `cleanup()` method within the cohort to flush any existing retired objects to the hazard pointer domain for retirement-evaluation, kick off a reclamation cycle, and also retire any retired objects pending within the cohort.

Differential Revision: D51647789
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51647789

@instw
Copy link
Author

instw commented Nov 30, 2023

CC @magedm - Does this look like a reasonable fix?

@DanielLiamAnderson
Copy link

I ran this against our benchmark suite in which we initially observed the bug (ParlayHash) and it does indeed appear to be fixed! Thanks for working on it.

facebook-github-bot pushed a commit that referenced this pull request Dec 7, 2023
Summary:
PROBLEM
For linked objects, the ref count and link count are stored in 16 bit integers. It's easily possible to overflow either counts.

SOLUTION
Increase the counts to be stored using 32 bits. While this theoretically increases the footprint of the `hazptr_obj_base_linked` struct from 28 bytes to 32 bytes, the increase is small; and in many cases struct padding would anyways cause the `hazptr_obj_base_linked` to be 32 bytes.

Note: This is a mitigation for #2097 . Theoretically its possible to increase contention more to cause the issue to recur, but in practice that's hard to do. See #2107 for a potential fix to the underlying issue.

Reviewed By: ot

Differential Revision: D51829892

fbshipit-source-id: f3ef8f7cf245dd7ff0e1ba6f6ee7bb15ead532ef
facebook-github-bot pushed a commit to facebook/hhvm that referenced this pull request Dec 7, 2023
Summary:
PROBLEM
For linked objects, the ref count and link count are stored in 16 bit integers. It's easily possible to overflow either counts.

SOLUTION
Increase the counts to be stored using 32 bits. While this theoretically increases the footprint of the `hazptr_obj_base_linked` struct from 28 bytes to 32 bytes, the increase is small; and in many cases struct padding would anyways cause the `hazptr_obj_base_linked` to be 32 bytes.

Note: This is a mitigation for facebook/folly#2097 . Theoretically its possible to increase contention more to cause the issue to recur, but in practice that's hard to do. See facebook/folly#2107 for a potential fix to the underlying issue.

Reviewed By: ot

Differential Revision: D51829892

fbshipit-source-id: f3ef8f7cf245dd7ff0e1ba6f6ee7bb15ead532ef
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants