-
Notifications
You must be signed in to change notification settings - Fork 49
Description
The kubelet
and the server
should be resilient to pod restart.
The kubelet
has issue restarting due to the webhook cert
The server
has issue restarting due to the boostrap
secret
Issue encountered while using persistence.type: dynamic
to persist ETCD data:
If the server
pod IP changes (pod killed, reschedules, node restarted,...) the server
fail to come up again:
time="2025-01-03T15:29:35Z" level=info msg="Failed to test data store connection: this server is a not a member of the etcd cluster. Found [k3k-mycluster-server-0-dc22a3d8=https://10.244.3.241:2380], expect: k3k-mycluster-server-0-dc22a3d8=https://10.244.3.243:2380"
The shared mode use the embedded ETCD which use the local IP (Pod IP) to register the member. When the Pod IP changes , ETCD fails to start.
Potential solutions
Set the proper ETCD startup config using the headless service or services
etcd-arg:
- --initial-cluster=...
- --advertise-client-urls=...
- --initial-advertise-peer-urls=...
This remains challenging due to the embedded nature of ETCD. The server
pod is not considered running until ETCD is running and ETCD won't start because the dns resolution will fail.
Use Sqlite in shared mode instead of ETCD
This will require some rework of the boostrap
part which kubelet
currently rely on to connect to the cluster.
We could use --write-kubeconfig
to directly store the kubeconfig
for the kubelet