Make cluster controller highly-available #1815

tillrohrmann · 2024-08-09T07:56:45Z

In order to tolerate the loss of a cluster controller, we need another cluster controller taking over. Otherwise, we risk that the Restate cluster becomes unavailable because the cluster controller is responsible for electing new Restate leaders. One high-level idea could be that the nodes running a cluster controller gossip among each other to notice if a cluster controller goes down. Additionally, the cluster controllers could obtain a leader epoch from the metadata store to decide who is the current leader. Only after the current leader is thought to be dead, another cluster controller would start campaigning for the leadership by obtaining a higher leader epoch and telling the others about it.

Tasks

Give feedback

tillrohrmann · 2024-11-06T17:21:07Z

One idea could be the following:

All nodes that run the Admin role will be cluster controller candidates. The cluster controller candidates heartbeat each other. They find themselves through the NodesConfiguration. When the current cluster controller leader does not respond to heartbeats, then the other candidates can try to become the leader.

The way a candidate becomes leader is to write their generational node id, an incrementing epoch and a timestamp to the NodesConfiguration. The candidate that wins in updating the NodesConfiguration with its leader information becomes the new leader. The other candidates will start heartbeating the new leader waiting for the chance to step up.

While being a leading cluster controller, the leader monitors for NodesConfiguration changes in order to preempt itself if it is no longer the leading cluster controller.

Control messages that are being sent to a Node can contain the leader epoch. This can be used on the recipient side for filtering out messages from outdated leaders.

With this approach, there is the problem of a possible leadership ping-pong if two admin nodes are partitioned from each other. Ways to mitigate this problem is to introduce a grace period before a candidate tries to run again for leadership or sharing liveness information with other nodes to have a more robust liveness mechanism. The leadership selection would benefit from a more refined heartbeat mechanism that generates fewer false positives.

The leadership information does not have to be written to the NodesConfiguration. Writing it to the NodesConfiguration has the advantage that this information is automatically spread throughout the cluster.

muhamadazmy · 2024-11-07T16:44:33Z

Discussion Summary

Phase 1

Cluster Controller(s) continues to collect all node state (observed state) by the means of heartbeat
If CC (cluster controller) holds the lowest node id of all nodes with admin role. The CC assumes itself a leader
- Otherwise no action is taking over the current cluster state.
Collected state is verified that it's a valid state. This to avoid doing unnecessary movements or changes even if it doesn't match the CC target state, this to avoid conflicting with other possible running CC that thinks they are leaders.

Phase 2

Implement Gossip protocol. All nodes will have an observed view of the cluster, this view can then be used by other components in the system including the CC on this node (if admin)
Drop the Scheduler Plan from metadata store? (as far as I understand) since this won't be needed since all nodes will already have the observed plan.
Drop AttachRequest (not sure what it does yet)

tillrohrmann · 2025-02-03T10:33:04Z

For the preview version we have done everything we need.

tillrohrmann mentioned this issue Aug 9, 2024

Distributed Restate preview #1675

Closed

muhamadazmy assigned muhamadazmy and tillrohrmann Nov 7, 2024

tillrohrmann closed this as completed Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make cluster controller highly-available #1815

Make cluster controller highly-available #1815

tillrohrmann commented Aug 9, 2024 •

edited

Loading

Tasks

tillrohrmann commented Nov 6, 2024 •

edited

Loading

muhamadazmy commented Nov 7, 2024

tillrohrmann commented Feb 3, 2025

Make cluster controller highly-available #1815

Make cluster controller highly-available #1815

Comments

tillrohrmann commented Aug 9, 2024 • edited Loading

Tasks

tillrohrmann commented Nov 6, 2024 • edited Loading

muhamadazmy commented Nov 7, 2024

Discussion Summary

Phase 1

Phase 2

tillrohrmann commented Feb 3, 2025

tillrohrmann commented Aug 9, 2024 •

edited

Loading

tillrohrmann commented Nov 6, 2024 •

edited

Loading