Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arbitrary stuck queries #827

Open
JuHwon opened this issue Mar 15, 2024 · 16 comments
Open

arbitrary stuck queries #827

JuHwon opened this issue Mar 15, 2024 · 16 comments

Comments

@JuHwon
Copy link

JuHwon commented Mar 15, 2024

We have an issue, where the sql client just does not respond. In other words, the promise does not resolve nor reject.
The application has peaks with lots of concurrent requests. The issue though also happens when there is not much load. It just arbitrary does not respond. When i restart the application, everything works fine again.

The client configuration:

    return postgres(options.connectionString, {
      idle_timeout: 0,
      connect_timeout: 30, 
      max: 50,
    });

I have a feeling that it happens after the service ran for some time already.

I also had a feeling that it often occurs with sql.begin(...), but i have also seen it once at a single sql`SELECT * FROM x WHERE id = ${id}`

Not sure where to start looking at here, since we are currently not able to reproduce it. Does anyone have any ideas why this could happen? I would also be happy to contribute to the project if we are able to find the issue here.

UPDATE
Here is the chart of one occurrence of the postgres DB, the needle is placed when the incident happened.
image

@Miraj98
Copy link

Miraj98 commented Mar 15, 2024

Facing same issue

@JuHwon
Copy link
Author

JuHwon commented Mar 15, 2024

Facing same issue

Hi @Miraj98, thanks for responding. Do you have any hints where and when you are facing this problem?
Is it correlated to transactions (sql.begin)? Does it correlate to the time the process is running?
What are your connection settings? Are you able to reproduce it?
What are your postgres client options?

maybe also @porsager has some hints for us, where we should look at to find something?

@Miraj98
Copy link

Miraj98 commented Mar 15, 2024

Facing same issue

Hi @Miraj98, thanks for responding. Do you have any hints where and when you are facing this problem? Is it correlated to transactions (sql.begin)? Does it correlate to the time the process is running? What are your connection settings? Are you able to reproduce it? What are your postgres client options?

maybe also @porsager has some hints for us, where we should look at to find something?

Hey @JuHwon I just did some debugging and I am facing connection issues specifically when using bun so not sure if my case will help. Sorry.

@JuHwon
Copy link
Author

JuHwon commented Mar 15, 2024

Facing same issue

Hi @Miraj98, thanks for responding. Do you have any hints where and when you are facing this problem? Is it correlated to transactions (sql.begin)? Does it correlate to the time the process is running? What are your connection settings? Are you able to reproduce it? What are your postgres client options?
maybe also @porsager has some hints for us, where we should look at to find something?

Hey @JuHwon I just did some debugging and I am facing connection issues specifically when using bun so not sure if my case will help. Sorry.

i would not exclude the possibility that the issue is related when the only difference is using bun vs node, right?

@AdamAkiva
Copy link

AdamAkiva commented Mar 16, 2024

TLDR:
Every transaction has to run on the same connection, therefore for every transaction a new connection has to remain open until the transaction ends.
PostgresQL default connection limit is 100, hence you can only use 100 transactions concurrently (unless you changed it on the database level).
I'm not sure why the queries get stuck if that limit is reached, and I advise you to use the options stated below on how to debug and research the issue.

In depth:
A few days ago a co-worker any myself were presented with a similar issue when using Drizzle-ORM and using postgres.js as the handler.
We found a few interesting things.
I'll start by stating PostgresQL default number of concurrent clients is 100.
Every transaction by definition has to be done on the same connection, which means if you use them concurrently, e.g using Promise.all and you have a lot of concurrent connections you may reach said limit.
In regards to the requests getting "stuck". This is the interesting/weird part, in essence (from our understanding) we get to a so called "deadlock".
What may happen is that if the requests are done concurrently, you may have a lot transaction start statements before even starting to processing any of them.
The expected behavior in this case is just getting an error message for too many clients, but for some reason that does not happen.
There as another issue with Drizzle but that does not effect this specific case.
What I advise you to do is to run the PostgresQL using the all option here, e.g: PostgresQL -c log_statement=all
Then follow the logs given by the database and check if there are any errors regarding to the transactions.
Consider adding this option as well.
If you find any notes and results, I'll be happy if you share them so we can learn from it as well.

P.S I have an assumption the the connection pool has an issue when too many concurrent transactions are done currently and that's why you had this issue while doing SELECT * FROM x WHERE id = ${id} as well but I can't be sure without specific logs from the database itself.
In addition, if possible, try to increase the max connection option to any value > 100 and see whether you receive an error message which says that too many clients are already active. This may indicate the problem is connected to too many concurrent active transactions

@porsager
Copy link
Owner

Would you think your descriptions matches the ones in #738 ? That change has been released in v3.4.4 today. Would be nice if you all could give that a try and report back.

@kam-st
Copy link

kam-st commented Jun 9, 2024

Issue still persists, directly pulling from github.

@JuHwon
Copy link
Author

JuHwon commented Jun 9, 2024

Issue still persists, directly pulling from github.

I had no time to investigate this issue further yet. I would really love to have a test to reproduce this.
@kam-st at what nodejs version are you experiencing this issue?

We got better results, when reducing the concurrency peeks for the client.
But still got one occurrence with NodeJS 20 and v3.4.3 two weeks ago. It also seems to correlate with the time the process is running.
We updated to NodeJS 21 and v3.4.4 now and will observer further. I hope we find time in the future to investigate this issue in detail and get some reproducible tests.

@kam-st
Copy link

kam-st commented Jun 9, 2024

Using Node 20.14 LTS.

Does your app make queries on refresh of page? Such as auth ? If yes, then try to refresh page multiple times. I have been able to reproduce this issue for me every single time.

@JuHwon
Copy link
Author

JuHwon commented Jun 9, 2024

Does your app make queries on refresh of page? Such as auth ? If yes, then try to refresh page multiple times. I have been able to reproduce this issue for me every single time.

If you can reproduce it every single time, i am not sure the issue is correlated. In our case it is really arbitrary. It's a rest api service, having some QPS peeks after specific endpoints are invoked. flatting the QPS curve helped to make the issue less likely.

@kam-st
Copy link

kam-st commented Jun 9, 2024

It is more arbitrary if I have max connections more than 100 set. I feel like there is new connection made for every single query. Base on idle connections, if requests are consistent then no problem. But there are large amount of queries in short period of time, then max connections gets exhausted. Once request load dies down, then idle connections are closed and requests can be made again.

I know idle connections are set to 0 for you.. But is it really 0 if you have high number of concurrent queries. Even if it takes 0.1 sec to close connection, the issue can persists.

@kam-st
Copy link

kam-st commented Jun 9, 2024

TLDR: Every transaction has to run on the same connection, therefore for every transaction a new connection has to remain open until the transaction ends. PostgresQL default connection limit is 100, hence you can only use 100 transactions concurrently (unless you changed it on the database level). I'm not sure why the queries get stuck if that limit is reached, and I advise you to use the options stated below on how to debug and research the issue.

In depth: A few days ago a co-worker any myself were presented with a similar issue when using Drizzle-ORM and using postgres.js as the handler. We found a few interesting things. I'll start by stating PostgresQL default number of concurrent clients is 100. Every transaction by definition has to be done on the same connection, which means if you use them concurrently, e.g using Promise.all and you have a lot of concurrent connections you may reach said limit. In regards to the requests getting "stuck". This is the interesting/weird part, in essence (from our understanding) we get to a so called "deadlock". What may happen is that if the requests are done concurrently, you may have a lot transaction start statements before even starting to processing any of them. The expected behavior in this case is just getting an error message for too many clients, but for some reason that does not happen. There as another issue with Drizzle but that does not effect this specific case. What I advise you to do is to run the PostgresQL using the all option here, e.g: PostgresQL -c log_statement=all Then follow the logs given by the database and check if there are any errors regarding to the transactions. Consider adding this option as well. If you find any notes and results, I'll be happy if you share them so we can learn from it as well.

P.S I have an assumption the the connection pool has an issue when too many concurrent transactions are done currently and that's why you had this issue while doing SELECT * FROM x WHERE id = ${id} as well but I can't be sure without specific logs from the database itself. In addition, if possible, try to increase the max connection option to any value > 100 and see whether you receive an error message which says that too many clients are already active. This may indicate the problem is connected to too many concurrent active transactions

Hi Adam,

Great amount of information through your post. Just curious, what approach would you recommend to make sure not a lot of connections are opened. Beside using idletime_out etc.

@chanon
Copy link

chanon commented Jun 15, 2024

Hello everyone. I have consistently reproduced a bug that causes queries to hang/become stuck. Please see #893. I am curious whether this is what you have been experiencing (check your code!), or if there is some other bug.

TLDR of #893 is, if you do sql.begin and then run queries inside it without correctly using the given sql argument, then it becomes likely that your connection will hang.

@displayErrors
Copy link

For me set max: 1 solved similar issue, but I don't know why.

@odian42
Copy link

odian42 commented Dec 3, 2024

We have a similar issue, but I've boiled it down to a more simple problem: query promises are neither resolved nor rejected if a connection gets re-used for another query.

For this example I've set max connections to 1.

const debug = (cid: unknown, query: unknown, params: unknown) => console.debug('SQL', { cid, query, params });

const sql = postgres(url, { prepare: false, max: 1, debug });

async function test() {
	await Promise.all([selectSomething(), selectSomething()]);
}

async function selectSomething() {
	await sql`select 1`;
	console.log('Got data');
}

When the test() function is invoked, the second promise never resolves/rejects, it's just lost.
From the debug function I see both queries targeting the same connection without waiting.

SQL { cid: 2, query: 'select 1', params: [] }
SQL { cid: 2, query: 'select 1', params: [] }
Got data

(queries are sent one right after another and the second "Got data" output never appears)

It would be great if the maintainer finds time to fix this critical issue or correct my findings if I'm wrong.
Thank you so much in advance!

@henryzhang03
Copy link

facing same issue as above, does not work nicely with many promises in a promise.all^ @porsager

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants