Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster creation fails while running Sieve with kapp-controller #117

Open
jerrinsg opened this issue Mar 10, 2023 · 5 comments
Open

Cluster creation fails while running Sieve with kapp-controller #117

jerrinsg opened this issue Mar 10, 2023 · 5 comments

Comments

@jerrinsg
Copy link
Contributor

I am hitting issues when trying to run Sieve with kapp-controller.

I am able to build the controller image successfully:

$ python3 build.py -c examples/kapp-controller -m all
...

Succeeded
kapp-controller-sha256-47c5a7b5df0fc9142e825b6ce5d767760db91b7d381bd0c2ce4b7fc05256c8ee
Untagged: kbld:kapp-controller-sha256-47c5a7b5df0fc9142e825b6ce5d767760db91b7d381bd0c2ce4b7fc05256c8ee

But running Sieve with kapp-controller in learn mode fails:

$ python3 sieve.py -c examples/kapp-controller -w create -m learn --build-oracle
...
ERROR: failed to create cluster: failed to init node with kubeadm: command "docker exec --privileged kind-control-plane kubeadm init --skip-phases=preflight --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1
Command Output: I0309 00:17:00.337861     217 initconfiguration.go:255] loading configuration from "/kind/kubeadm.conf"
...

[FAIL] kind create cluster --image ghcr.io/sieve-project/action/node:v1.24.10-learn --config kind_configs/kind-1a-2w.yaml
Traceback (most recent call last):
  File "/Users/jshajigeorge/work/sieve/sieve.py", line 264, in setup_kind_cluster
    os_system(
  File "/Users/jshajigeorge/work/sieve/sieve_common/common.py", line 181, in os_system
    raise Exception(
Exception: Failed to execute kind create cluster --image ghcr.io/sieve-project/action/node:v1.24.10-learn --config kind_configs/kind-1a-2w.yaml with return code 1

(full logs attached in kapp-learn.err.txt)

See kubelet-log.txt for the logs exported by kind (kind export logs).

I'm trying this on a Mac

$ sw_vers
ProductName:		macOS
ProductVersion:		13.0.1
BuildVersion:		22A400
@jerrinsg
Copy link
Contributor Author

jerrinsg commented Mar 11, 2023

Hitting the same issue on an Ubuntu VM as well:

$ python3 sieve.py -c examples/kapp-controller -w create -m learn --build-oracle
...
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

kapp-learn-err.txt

Kubelet log:

Mar 10 20:19:55 kind-control-plane kubelet[283]: I0310 20:19:55.315958     283 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt"
Mar 10 20:19:55 kind-control-plane kubelet[283]: E0310 20:19:55.317086     283 certificate_manager.go:471] kubernetes.io/kube-apiserver-client-kubelet: Failed while requesting a signed certificate from the control plane: cannot create certificate signing request: Post "https://kind-control-plane:6443/apis/certificates.k8s.io/v1/certificatesigningrequests": dial tcp 172.18.0.3:6443: connect: connection refused
Mar 10 20:19:55 kind-control-plane kubelet[283]: W0310 20:19:55.320524     283 sysinfo.go:203] Nodes topology is not available, providing CPU topology
Mar 10 20:19:55 kind-control-plane kubelet[283]: Error: failed to run Kubelet: invalid configuration: cgroup ["kubelet"] has some missing paths: /sys/fs/cgroup/cpuacct/kubelet.slice, /sys/fs/cgroup/hugetlb/kubelet.slice, /sys/fs/cgroup/pids/kubelet.slice, /sys/fs/cgroup/cpuset/kubelet.slice, /sys/fs/cgroup/memory/kubelet.slice, /sys/fs/cgroup/cpu/kubelet.slice, /sys/fs/cgroup/systemd/kubelet.slice

Host details:

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.2 LTS
Release:	22.04
Codename:	jammy

$ uname -a
Linux jerrin-virtual-machine 5.19.0-35-generic #36~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 17 15:17:25 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

@lalithsuresh
Copy link
Collaborator

It's worth sharing a note about the workaround here too (to rebuild the image).

@jerrinsg
Copy link
Contributor Author

On Mac, building the Kind image locally and running Sieve again fixed this issue:

$ python3 build.py -v v1.24.10 -m learn
..
Image "kindest/node:latest" build completed.

$ python3 build.py -v v1.24.10 -m test
..
Image "kindest/node:latest" build completed.

$ python3 sieve.py -c examples/kapp-controller -w create -m learn --build-oracle
...
Generated 8 intermediate-state test plan(s) in sieve_learn_results/kapp-controller/create/learn/intermediate-state
Total time: 410.3174147605896 seconds

@kapilagrawal95
Copy link

When I run the command python3 sieve.py -c examples/kapp-controller -w create -m learn --build-oracle, I get the following error:
"ERROR: image: "ghcr.io/sieve-project/action/kapp-controller:learn" not present locally
Cannot load image ghcr.io/sieve-project/action/kapp-controller:learn locally, try to pull from remote
Error response from daemon: Head "https://ghcr.io/v2/sieve-project/action/kapp-controller/manifests/learn": denied
[FAIL] docker pull ghcr.io/sieve-project/action/kapp-controller:learn"

@marshtompsxd
Copy link
Member

@kapilagrawal95 The kapp-controller image is not in our github repo. You might need to build it and push it to your repo first. You can configure the repo name here: https://github.com/sieve-project/sieve/blob/main/config.json#L2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants