Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connections to rabbit queues not re-created after rabbit restart #937

Open
misha-p opened this issue Jun 17, 2024 · 9 comments
Open

Connections to rabbit queues not re-created after rabbit restart #937

misha-p opened this issue Jun 17, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@misha-p
Copy link

misha-p commented Jun 17, 2024

Wolverine service listening to rabbit queue permanently loosing connection channel after rabbit restart.

To Reproduce
Steps to reproduce the behavior:

  1. Run Pinger and Ponger projects in Samples/PingPongWithRabbitMq
  2. In RMQ mgmt see both queues pings and pongs have 1 consumer/channel each
  3. Restart rabbitmq container
  4. In RMQ mgmt see queues pings and pongs lost consumers, not restored. Pinger keeps publishing, but messages not consumed.

Expected behavior
After rabbitmq container fully restarts, both channels are created for queues pings and pongs.

Additional context
Latest codebase 2.13.0
When I run with empty RMQ from wolverine docker compose, everything works as expected.
The above happens when RabbitMQ server contains many queues used by bunch of other (NServiceBus) services.

@misha-p misha-p changed the title Connections to rabbit queues not re-created after reabbit restart Connections to rabbit queues not re-created after rabbit restart Jun 17, 2024
@jeremydmiller
Copy link
Member

Michael, I've tested this pretty extensively, and it's been able to reconnect at least locally.

@jeremydmiller
Copy link
Member

Worst case scenario, I'll put some kind of "watcher" on that kickstarts the listener

@misha-p
Copy link
Author

misha-p commented Jun 18, 2024

Thanks Jeremy - I understand, this one is hard to reproduce. As I mentioned it all works great with "empty" rabbit server (such as one in docker compose in Wolverine test harness). We see this issue when Rabbit server is in the large infrastructure, with 100+ queues and exchanges. And I see nothing suspicious in Rabbit logs at restart time.
(BTW, all NServBus services reconnect fine,, but Wolverine ones aren't, or "sometimes" or partially - e.g. one of two listeners gets channels)
I too will look at it more, maybe it really is a matter of making more aggressive listener startup

@jeremydmiller
Copy link
Member

Any issues in the logs? Wolverine would be logging interruptions from the Rabbit MQ client

@jeremydmiller
Copy link
Member

Is it only the listeners that are the problem, or does it flip out trying to send too?

@jeremydmiller
Copy link
Member

Sorry @misha-p, one more thing, there's a Rabbit MQ client V7 coming soon that's async all the way down. Not sure about the timing of that, we've got a PR to add that to Wolverine though.

Also, any log messages like:

"Unexpected Rabbit MQ connection shutdown"

or

"Rabbit MQ connection is blocked because of {Reason}"

or

"Rabbit MQ connection error on callback"

@jeremydmiller jeremydmiller modified the milestone: 3.0 Jun 18, 2024
@misha-p
Copy link
Author

misha-p commented Jun 18, 2024

In logs I see Unexpected channel shutdown for Rabbit MQ. Wolverine will attempt to restart... when rabbit server is down.
Also I don't think the issue is with senders, only with listeners,

And btw - I experimented a little with channel agent and surprisingly reconnect got fixed with this line:

image

I figured that in case of a rabbit with tons of queues, callback teardownChannel() doesn't happen at the right time and the agent state stays Connected, so I forced it. I don't think it's a proper fix though, but maybe it will help you to reason on our case.
Anyways, with that change channels for listeners now get created after Rabbit server is back up

@misha-p
Copy link
Author

misha-p commented Jun 18, 2024

did more tests with the above - still not stable, fully reconnects only sometimes (((

@jeremydmiller
Copy link
Member

@misha-p & I spoke today. Problem seems to be only on listeners, and not senders (we think). I'm going to make a change where a Rabbit MQ listener immediately tries to send a Ping message when it starts up to see if that helps

@jeremydmiller jeremydmiller added the bug Something isn't working label Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants