-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable unattented recovery from zero replicas #30
Conversation
mariadb-0 crashlooped while there were no mariadb-1 or mariadb-2: WSREP: gcs/src/gcs.cpp:gcs_open():1458: Failed to open channel 'my_wsrep_cluster' at 'gcomm://mariadb-0.mariadb,mariadb-1.mariadb,mariadb-2.mariadb': -110 (Connection timed out)
We've been running in very stable and very ephemeral clusters - nothing inbetween - for a long time so I had forgotten about the DNS based peer detection in the init script. Because the maradb service tolerates unready endpoints the logic will actually only trigger if mariadb-0 is the only scheduled pod. This PR is now a more optimistic approach to recovery. It assumes that partition is rare, and if it happens that we're ok with selecting a random replica as recovery point. If the MariaDB cluster goes down without partition the replicas all have sufficiently accurate state, therefore we can bootstrap on the first pod. If there was partition some data is likely lost. The branch name is misleading now. |
I suspect that the reliability of the
|
at fresh apply and scale from 0 directly to >1 we'll handle pod churn better now. To reduce Init:Error events we could: - Try to reduce time to readiness - Add waits upon detection of missing peers, before exiting - Retry within the script for pod indexes >0
The init script actually depended on finding Ready members of the statefulset, not all existent members. With a new service that reports such entries we can now use |
There's a slightly outdated piece of guidance now at kubernetes-mysql-cluster/10conf-d.yml Line 137 in fedf5f0
AUTO_RECOVERY_MODE .
|
Running a cluster on preemptible nodes now. It's not a production critical cluster, so because of
OrderedReady
when mariadb-0 for some reason goes unready and we don't attend to it and fix it immediately any evictions on the higher ordinal pods mean they don't get scheduled again.In this case mariadb-0 crashlooped on:
Unfortunately I didn't investigate why the init-container completed, when normally it should have complained about having data but not knowing how to bootstrap. Instead I deleted the statefulset and recreated it with
Parallel
.When scaling up from replicas 0 to 3 now mariadb-0 waits in the init container as expected. The other two crashloop, after failing to join WSREP. I think that's a pretty useful behavior. I simply ran the
kubectl exec -c init-config mariadb-0 -- touch /tmp/confirm-force-bootstrap
option logged by the init script and the cluster recovered.Theory: To automate recovery, but accept a risk of lost writes, we could default to "confirm-force-bootstrap" on mariadb-0. With the assumption that partitioned clusters are rare the risk of lost writes should be low. Alternatively we could introduce manual start mode for all pods, so that admins can follow the MariaDB's recovery docs and decide to bootstrap the one with the newest data.
While waiting for my choice of start mode with mariadb-0 the other two pods logged the following: