Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pgconn_connect_poll close the socket prior to calling PQconnectPoll #564

Merged
merged 1 commit into from
Apr 9, 2024

Conversation

jcalvert
Copy link
Contributor

@jcalvert jcalvert commented Apr 8, 2024

Make pgconn_connect_poll close the socket prior to calling PQconnectPoll

Since PQconnectPoll can change the underlying socket and thus the file descriptor, closing the socket after can manifest a race condition where libpq has closed the socket and another thread has opened a socket with the recycled file descriptor. When the original socket is closed from Ruby, the VM will notify the new thread that the FD has closed (erroneously) resulting in an exception.

This is an exception I encountered in a production environment (C Ruby 3.3), somewhat sporadically It is quite misleading as the way this manifests is that the exception raises while waiting on the socket, so it seems as if connections are getting shared between threads.

The reason this happens is because when the IO object is closed on the Ruby side, the VM calls rb_notify_fd_close which in turn sends an exception to any threads waiting on the file descriptor. On the Ruby 2.x path does not have this problem as IO.select will return the exception array, although I believe the VM still calls close()

`PQconnectPoll`

Since `PQconnectPoll` can change the underlying socket and thus the file
descriptor, closing the socket after can manifest a race condition
where libpq has closed the socket and another thread has opened a socket
with the recycled file descriptor. When the original socket is closed
from Ruby, the VM will notify the new thread that the FD has closed
(erroneously) resulting in an exception.

The same for `PQresetPoll`.
@larskanis larskanis force-pushed the fix_connect_poll_fd_reuse branch from 0214687 to 21a064b Compare April 9, 2024 13:30
@larskanis larskanis merged commit 8eec744 into ged:master Apr 9, 2024
9 of 17 checks passed
@larskanis
Copy link
Collaborator

Thank you for doing the analysis and the fix!

I changed PQresetPoll equally and moved the test to its own it example. The test still shows the issue, without running the code of the previous test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants