Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retire this repo in favor of the Bitnami mariadb-galera chart #35

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

solsson
Copy link
Contributor

@solsson solsson commented Jun 20, 2021

Breaking. Existing PVCs won't work.

As with our own setup, recovery from zero replicas is prone to data loss. With OrderedReady it shouldn't be prone to split brain. See https://github.com/bitnami/charts/tree/master/bitnami/mariadb-galera#bootstraping-a-node-other-than-0.

TODO

  • Compare character set configuration between old and new
    • There's lots of config in the helm configmap that we don't know the provenance of, but let's consider it validated by the community. Hence I chose to sed the configmap instead of overriding my.cnf. It now matches the choice we made here.

solsson added a commit to Yolean/unhelm that referenced this pull request Jun 20, 2021
@solsson
Copy link
Contributor Author

solsson commented Jun 21, 2021

When starting a new cluster from empty volumes, you'll get a split brain situation unless replicas is 1. Bootstrap can be tweaked using repeated helm runs, but it's actually easier to apply regularly but with 1 replica.

This stack is most certainly more robust that what we intend to replace, but there's more lines of bash to understand and source and issues live in separate repositories for the image and the chart.

I see no evidence of automated tests in recent commits to the docker image. This means it's as ad-hoc as our script, but baked into the custom image combined with snippets mixed with helm.

I've seen the following outcomes of scaling down to zero, then back up to three again:

  • First replica starts and the other two keep crash looping
  • First replica fails with
2021-06-21  5:12:06 0 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 
  • First replica fails earlier, in bitnami scripts:
mariadb 05:15:39.31 ERROR ==> It is not safe to bootstrap form this node ('safe_to_bootstrap=0' is set in 'grastate.dat'). If you want to force bootstrap, set the environment variable MARIADB_GALERA_FORCE_SAFETOBOOTSTRAP=yes

None of these are split brain. I'll try to capture both initial apply and some failure modes in the test script.

@solsson
Copy link
Contributor Author

solsson commented Jun 21, 2021

Screenshot from 2021-06-21 17-28-43
From test.sh, demonstrating recovery from zero replicas with the guess that pod 0's volume is up to date. For actual disaster recovery see https://github.com/bitnami/charts/tree/70822dd1aea385cad908462be5fc1004ef8b5e07/bitnami/mariadb-galera#bootstraping-a-node-other-than-0

I think this stack is ok. It must be initialized by applying ./base-bootstrap, waiting for readiness, then applying ./base. I'm also not convinced that base/bootstrap-no.yaml actually prevents split brain, but after init with base-bootstrap + base I've been unable to produce a state with ready pods that aren't part of the galera cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant