-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically rebalancing connections of realtime repl after recovering node failure. [JIRA: RIAK-2380] #724
Comments
Hi Kaz, Realtime replication should rebalance automatically - did it not work? It was added around 2.0.3 as PR #651 Unfortunately there aren't many log messages to see if it is working - we should at least modify it to let us know when it is reconnecting and why. The only want to know is if you see
From the same Pid in the logs as the realtime process is re-used. |
I might have misunderstood about it. Please let me check again.... |
I checked the behavior and the implementation of rebalancing a connection to the sink node. I confirmed the reconnecting started with some delay after ring_update was triggered. That works fine! 😊 |
Sorry for the reopening. It seems like restarting a sink node doesn't trigger ring_update on a source node. I misunderstood the list of stopped nodes are transferred to the source cluster. But it looks like ring data doesn't have such data as metadata. In the following case, the rebalancing doesn't start.
I guess another trigger is needed to start the rebalancing in addition to |
On realtime replication, connections from a source cluster re-establishes to another healthy node after a node downs in a sink cluster. However, no node will re-connect to the node even through the node recovers. This causes imbalance load by realtime replication in sink cluster.
A workaround is to restart realtime replication by
riak-repl realtime stop/start <clustername>
, but it would be better to rebalance connections automatically if a failure node come back, or new node is added to sink cluster.related to #350
The text was updated successfully, but these errors were encountered: