-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disaster recovery of etcd cluster #104
Comments
We didn't use etcd-operator, we used the built-in kstone-etcd-operator.It has more complete support for persistent storage and better disaster tolerance. In the next version, we will support rebuilding lost-quorum clusters using snapshots, and kstone-dashboard will also support visualization operations. |
Hi tangcong, thanks for your reply. Based on the official wiki: https://etcd.io/docs/v3.5/op-guide/runtime-configuration/# restart-cluster-from-majority-failure When we use a dynamic way to deploy Etcd clusters(That's how it works in K8s), it appears that the only way to rebuild a lost-quorum cluster is to take a snapshot. The problem of snapshots is that if we use a snapshot to recover, we'll definitely lose some data(after the snapshot is taken, some new update are write to etcd, then k8s cluster is down). |
We support storing data on persistent data disks. If it is not persistent, we can also force a snapshot from the healthy node to rebuild the cluster, so that the probability of losing data is very small. @rayliu419 |
"We support storing data on persistent data disks. If it is not persistent" "we can also force a snapshot from the healthy node to rebuild the cluster, so that the probability of losing data is very small."
Actually, I don't know why etcd community doesn't have a solution to recover it without using snapshot(maybe they have, but I don't find it). In cloudnative enviorment, it's a critical case. |
I think I find a way to recover the data without losing data. Etcd supports --force-new-cluster to reconfigure the cluster, we can find the latest node in the etcd cluster to restart the cluster. If more than half pvc survive, we can recover it in this way. |
Yes, I submitted an issue #kstone-io/kstone-etcd-operator#2 at the time, and the plan was implemented in this way. thank you. |
Hi guys,
In current exsiting etcd operator implemtation, when we use k8s to deploy etcd cluster and etcd cluster lost quorum, the operator does nothing to recover it.
https://github.com/tkestack/kstone/blob/master/third_party/etcd-operator/pkg/cluster/reconcile.go#L95-L97
Do you have any plan to recover the "lost quorum" etcd cluster?
Thanks,
The text was updated successfully, but these errors were encountered: