4.11 IPI deploy on vSphere stuck on machine-config-daemon-pull.service and node-valid-hostname.service #1603
davidecelano
started this conversation in
General
Replies: 2 comments 7 replies
-
Please attach log bundle |
Beta Was this translation helpful? Give feedback.
6 replies
-
Is this reproducible in OKD 4.12? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Describe the bug
Deployment in IPI mode on vSphere for OKD 4.11 gets stuck because on master and worker nodes these 2 services fail
[systemd]
Failed Units: 2
machine-config-daemon-pull.service
node-valid-hostname.service
Logging via ssh to the master nodes (and later on worker nodes) and issuing the commands from root
podman system reset
hostnamect set-hostname nodename
reboot
solves the issue, services start and the deploy continues.
Version
OKD 4.11.0-0.okd-2023-01-14-152430 x86_64
Fedora CoreOS 36.20220716.3.1
vSphere 7.0.3 ESXI 7.0.3
IPI install
How reproducible
My test deployment has 3 master nodes, 1 worker, static DNS records, 1 DHCP Server, 1 ESXI host. Every deployment had the same issue. In the same environment multiple 4.10 clusters have been deployed without issues.
Log bundle
[core@localhost ~]$ sudo systemctl status -l machine-config-daemon-pull.service
× machine-config-daemon-pull.service - Machine Config Daemon Pull
Loaded: loaded (/etc/systemd/system/machine-config-daemon-pull.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2023-05-14 15:01:26 UTC; 1min 2s ago
Process: 2490 ExecStart=/bin/sh -c while ! /usr/bin/podman pull --authfile=/var/lib/kubelet/config.json --quiet >
Process: 2625 ExecStart=/usr/bin/podman run --rm --quiet --net=host -v /run/bin:/host/run/bin:z --entrypoint=cp >
Main PID: 2625 (code=exited, status=127)
CPU: 13.062s
May 14 15:01:26 localhost.localdomain podman[2625]: 2023-05-14 15:01:26.856367902 +0000 UTC m=+0.135907757 container>
May 14 15:01:26 localhost.localdomain podman[2625]: /usr/bin/coreutils: error while loading shared libraries: librt.>
May 14 15:01:26 localhost.localdomain podman[2625]: 2023-05-14 15:01:26.856651916 +0000 UTC m=+0.136191834 container>
May 14 15:01:26 localhost.localdomain podman[2625]: 2023-05-14 15:01:26.760291796 +0000 UTC m=+0.039831652 image pul>
May 14 15:01:26 localhost.localdomain podman[2625]: 2023-05-14 15:01:26.886320508 +0000 UTC m=+0.165860327 container>
May 14 15:01:26 localhost.localdomain podman[2625]: 2023-05-14 15:01:26.915591405 +0000 UTC m=+0.195131244 container>
May 14 15:01:26 localhost.localdomain systemd[1]: machine-config-daemon-pull.service: Main process exited, code=exit>
May 14 15:01:26 localhost.localdomain systemd[1]: machine-config-daemon-pull.service: Failed with result 'exit-code'.
May 14 15:01:26 localhost.localdomain systemd[1]: Failed to start machine-config-daemon-pull.service - Machine Confi>
May 14 15:01:26 localhost.localdomain systemd[1]: machine-config-daemon-pull.service: Consumed 13.062s CPU time.
lines 1-18/18 (END)
[core@localhost ~]$ sudo systemctl status -l node-valid-hostname.service
× node-valid-hostname.service - Wait for a non-localhost hostname
Loaded: loaded (/etc/systemd/system/node-valid-hostname.service; enabled; vendor preset: enabled)
Active: failed (Result: timeout) since Sun 2023-05-14 15:01:18 UTC; 1min 57s ago
Main PID: 1690 (code=killed, signal=TERM)
CPU: 1.265s
May 14 14:56:18 localhost.localdomain systemd[1]: Starting node-valid-hostname.service - Wait for a non-localhost ho>
May 14 14:56:18 localhost.localdomain mco-hostname[1690]: waiting for non-localhost hostname to be assigned
May 14 15:01:18 localhost.localdomain systemd[1]: node-valid-hostname.service: start operation timed out. Terminatin>
May 14 15:01:18 localhost.localdomain systemd[1]: node-valid-hostname.service: Main process exited, code=killed, sta>
May 14 15:01:18 localhost.localdomain systemd[1]: node-valid-hostname.service: Failed with result 'timeout'.
May 14 15:01:18 localhost.localdomain systemd[1]: Failed to start node-valid-hostname.service - Wait for a non-local>
May 14 15:01:18 localhost.localdomain systemd[1]: node-valid-hostname.service: Consumed 1.265s CPU time.
lines 1-13/13 (END)
Beta Was this translation helpful? Give feedback.
All reactions