-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Fix race conditions in the connection pool code. #18447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Arthur Schreiber <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Unfortunately, this does have performance impact on the pool performance (but whether that's noticeable in real world scenarios is unclear to me).
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18447 +/- ##
==========================================
- Coverage 67.51% 67.48% -0.03%
==========================================
Files 1607 1607
Lines 262687 262854 +167
==========================================
+ Hits 177340 177374 +34
- Misses 85347 85480 +133 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I have a test over at https://gist.github.com/arthurschreiber/fffe5c2ecf550e75fdee6e97cc08f9df that can reproduce the problem. Unfortunately, the test is flaky so I've not added it to this PR. |
} else { | ||
stack := connSetting.bucket & stackMask | ||
pool.settings[stack].Push(conn) | ||
pool.freshSettingsStack.Store(int64(stack)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, would an alternative approach work to avoid having to lock? What if we still optimistically put it into the stack, but then check after we've completed that if we have since been closed and then remove it again?
So you only pay the price during closing mostly and then having to do any locking or other synchronization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely open to alternative approaches. For your suggestion, don't we have to take a lock on capacityMu
to correctly decide whether the pool is closed or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can think about optimistic concurrency approach with epoch validation.
ConnPool can have epoch atomic.Uint64
. We invalidated with increment whenever there is a change like Close
or Capacity
.
In tryReturnConn
we read the epoch value and then read the capacity. Before returning the connection back to the stack we check the epoch again If it is changed we continue otherwise return to the stack.
After returning we check again and if there is a change we try to get the connection back, if we receive the connection continue again.
We do not expect capacity to change that frequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am working on an additional change that uses a pool generation number to allow Close()
to return instantly and basically break the link between checked out connections and the pool.
I think that could be used for the optimistic concurrency check. I'll try this out and see if I can make this work.
Description
This fixes the two race conditions described in #18202 (comment)
Related Issue(s)
#18202
Checklist
Deployment Notes