-
Notifications
You must be signed in to change notification settings - Fork 853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query execution stuck until idle-in-transaction timeout #2119
Comments
I recently switched a service to pgx from lib/pq because of a suspiciously similar issue. I'm only able to reproduce it by hammering my service with a large number of requests. A connection which is
Postgres 14.12 |
Coming back to this, we've now seen the issue with pgx as well:
There is something in the Postgres protocol which is not being handled correctly. Note that this is a rather severe bug. When this happens, it essentially invalidates a connection to Postgres. The connection will never close unless Postgres (or in our case, pgpool) is restarted. In our application, we use advisory locks to serialize writes to an entity with many constraints that need to be checked. This bug causes our entire application to lock up because all of the connections are waiting on advisory locks which cannot proceed because one connection waiting for a lock has been invalidated by this bug. |
@rf Given that the same problem occurs in pgx and in pq, it makes me doubt the problem is in pgx. Is a load balancer or connection pooler being used? Another commonality is database/sql. I don't suppose the problem can be reproduced with the native pgx interface? |
We are using pgpool-II. The issue isn't easy to reproduce so it isn't trivial to try using the native interface. I actually was not able to synthetically reproduce this issue in pgx after switching from pq to pgx. The stack trace above is from a production process that had locked up, we just saw this issue and it caused an outage for our product. It's wholly possible that pgpool-II is somehow munging the protocol. I tend to doubt that since it should mostly be a dumb pipe. I'm going to try reproducing this with full TCP dumping and a debugger on the process. |
OK, after further investigation I'm pretty sure we are seeing this error because of a bug in pgpool-II. I don't know if the other people who have seen this are seeing the same bug, however. The bug presents when reads in the beginning of a transaction are routed to a replica by pgpool-II, and the reads are killed because the replica can't keep up with the WAL ( I didn't end up doing any low level TCP debugging, so I can't say if a message is successfully sent back through pgpool-II which is then misunderstood by pgx. But I think it's likely that the bug lies outside of PGX. |
Query execution stuck until idle-in-transaction timeout is reached and application receives error
Library stack trace:
pg_stat_activity shows previous query within transaction. Actual (hanged) query is insert to different table. We exec query
INSERT INTO changelog
, search_path is configured via connection string.pg_locks
Version
The text was updated successfully, but these errors were encountered: