-
Notifications
You must be signed in to change notification settings - Fork 5.3k
[release/9.0-staging] Backport Reliability fixes for the Thread Pool #122362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/9.0-staging
Are you sure you want to change the base?
Conversation
|
Tagging subscribers to this area: @mangod9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR backports reliability fixes for the ThreadPool that address a subtle regression in the enqueuer/worker handshake algorithm. The regression could cause deadlocks when an enqueued work item is not guaranteed to execute unless more work items are enqueued. The fix reverts from a complex three-state machine (NotScheduled/Determining/Scheduled) to a simpler two-state flag (0/1) for worker thread coordination.
Key Changes:
- Simplified the worker thread request mechanism from a three-state enum to a binary flag in three ThreadPool-like components
- Updated the handshake algorithm to ensure workers clear the outstanding request flag before checking queues, preventing race conditions
- Ensured that when a worker processes an item, it always requests another worker if more items exist, preventing deadlocks
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs | Reverted ThreadPoolWorkQueue and ThreadPoolTypedWorkItemQueue from three-state QueueProcessingStage enum to simple _hasOutstandingThreadRequest flag; simplified EnsureThreadRequested() and Dispatch() methods; updated worker execution logic to prevent deadlocks |
| src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEngine.Unix.cs | Applied same algorithm revert to SocketAsyncEngine; replaced EventQueueProcessingStage enum with _hasOutstandingThreadRequest flag; updated EnsureWorkerScheduled() and Execute() methods for consistency with ThreadPool changes |
Comments suppressed due to low confidence (3)
src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEngine.Unix.cs:124
- Incorrect indentation: The closing brace is indented too far. It should align with the
ifstatement on line 119.
}
src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEngine.Unix.cs:138
- Incorrect indentation: The opening brace should align with the
publickeyword on line 137.
{
src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:1114
- Incorrect indentation: These lines should be aligned with the
elseblock at the same level. The closing brace on line 1114 should also align with theelsekeyword on line 1110.
Unsafe.As<IThreadPoolWorkItem>(workItem).Execute();
}
| if (!_workItems.TryDequeue(out var workItem)) | ||
| { | ||
| // Discount a work item here to avoid counting this queue processing work item | ||
| ThreadInt64PersistentCounter.Decrement( |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect indentation: This line should align with the comment above it, not be indented an extra level.
| ThreadInt64PersistentCounter.Decrement( | |
| ThreadInt64PersistentCounter.Decrement( |
Fixes: #121608
Backport of #121887
The ThreadPool in rare cases allows a scenario when an enqueued workitem is not guaranteed to be executed unless more workitems are enqueued. In some scenarios execution of a particular workitem may be necessary before more work is enqueued, thus leading to a deadlock.
This is a subtle regression introduced by a change in enqueuer/worker handshake algorithm.
The same pattern is used in 2 other ThreadPool-like internal features in addition to the ThreadPool.
Customer Impact
Regression
Testing
Standard test pass for deterministic regressions.
A targeted stress application that demonstrates the issue.
(no fix: hangs within 1-2 minutes, with the fix: runs for 20+ min. on the same system)
Risk
Low. This is a revert to the preexisting algorithm for the enqueuer/worker handshake. (in all 3 places)