-
Notifications
You must be signed in to change notification settings - Fork 821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sessionQueues is full #871
Comments
And MoquetteIdleTimeoutHandler also not triggered automatically kick the timeout of the client, I tested, the new client, or can be connected |
The session queue being full is very bad and causes the dropped actions not being handled by the broker. So if the dropped action is CONN LOST, this means the broker never processes the fact that a connection was lost. The default queue size of 1024 is not large enough for busy brokers, but there is no harm in significantly increasing this size. Try setting it to 5000 or 10000 or so. @andsel Maybe we should try adding OpenMetrics/Prometheus support to be able to monitor the queue status? |
I am now testing 3w client connections, disconnected, the default queue size has been set at 10000, but it still appears |
Session command queue 0 is full executing action CONN Session command queue 0 is full executing action flushQueues |
3w is 30000, right? |
yes |
This inability to add queues dynamically is a problem that can't be solved if you don't know how much data the client is sending |
A single client should never be able to cause a problem here, since each client is limited by the in-flight-message limit. When testing, this is different, since there is no network stack, and many clients take the same action at the same time in a very unnatural way. Dynamic queues have their own problems, especially when it comes to performance. |
The broker is certainly not only for a single client, multiple data clients will take up SESSION_QUEUE_SIZE, especially if the number of cpu cores is low, would it be better to handle the logic directly in the older version of postoffice |
The number of broker tests we have can vary from one hardware configuration to another |
describe:
|
Correct, but if a command throws an exception, we have an error state that can't be recovered from, so trying to continue will just make things worse. |
The older version was worse in every way, especially when it comes to behaviour under load. Plus, it had memory leaks and race conditions... Like I stated, in a real-world scenario it is very unlikely that all clients take the same action at exactly the same time. If you run into queue overruns in a real-world scenario (which tests very much are not) then you can try to increase the queue size. If that doesn't help, then your broker needs better hardware to handle the load you're putting on it. There may be other places in the code that need optimisation (like #841), but they are not caused by the queuing system, the queueing system may just be the first spot that notices it. |
I get to 4w and I still get Session command queue 0 is full |
The same operation may exist in batches on the simulated client |
Whether the queue can be adapted to nosql, it can have persistence |
If you can make a minimal demonstrator we can have a look, but with the little the information you've given we can't say more than we have. |
Sorry, other than the word nosql I can't read that. |
The main problem is that we can't process the lost operational information once it is full |
Whether the queue can be adapted to nosql, it can have persistence |
Of course not. If the broker is overloaded, data is lost. There is no way around that in any architecture. This is the internal command processing queue, it can not be persisted. If it overruns you are putting more load on your broker than your hardware can handle. |
Do you have the results of the test hardware configuration and the number of clients |
In my opinion, client connection and disconnection should be separated from the SessionEvent used for pushing content, and connection and disconnection should be more important |
I would like to ask if the memory leak problem of the old version refers to the following code |
If you run into exceptions, you should make separate issues for those, with the full stack trace. Otherwise we can't fix the underlying problem. So in your case the queue overrun is most likely a symptom of this exception.
There were many issues with the reference counting of message buffers. If you search the older issues you'll find the related ones. |
I changed the test server cpu to 4 cores, 50,000 sessionQueueSize, 30,000 client operation connections quickly, so far it seems stable, I will watch overnight. |
When will the next Release be released? I see a lot of code has been submitted since version 0.17 |
@Yunustt version 0.18.0 has been released, you can grab from JitPack https://github.com/moquette-io/moquette/?tab=readme-ov-file#embedding-in-other-projects, it's a lot easier than publishing on Maven Central Repository |
Session command queue is full,Session command queue 1 is full executing action CONN LOST,In this case, the connection status is not accurate, the actual client is no longer connected, but listConnectedClients can still get the data of the client's connected status, how to deal with this, can only increase SESSION_QUEUE_SIZE
The text was updated successfully, but these errors were encountered: