ABI UPI baremetal install problem and workaround for the latest (2024-09-26/27) version (v4.17, v4.16) of OKD-SCOS #2035
titou10titou10
started this conversation in
Pre-Release Testing
Replies: 2 comments 2 replies
-
Would the above also work for BM UPI (without ABI)? |
Beta Was this translation helpful? Give feedback.
1 reply
-
Thank you! Image: |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I was able to install a v4.17.0-0.okd-scos-2024-09-26-224828 cluster with the ABI with UPI/baremetal
[UPDATE]
This procedure works also with the current latest version of OKD: v4.17.0-okd-scos.ec.4
First, none of the FCOS image work as bootstrap images, including the one embedded with the latest releases (FCOS v39.20231101.3.0), this is addressed here #2019 (reply in thread) ("We are currently discussing a plan for that")
This includes the latest stable FCOS versions as of today (2024-09-28): v40.20240906.3.0
The problem is described here: #2015
So to install with the ABI, it is necessary to use another image for bootstrapping. This is done by overriding the bootstrap image by a rhcos image. Use the " OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE" environment variable for this before building the image
Then boot the nodes.
On the rendez-vous/bs node, the "assisted-installer-db" service fails/loop forever. Its role is to startup a postgresql database, but it fails, the reason is
The "/var/run/postgresql/" directory for the postgresql database to use for unix socket does not exist. This is most likely a bug in the "quay.io/okd/scos-content@sha256:47640b92fc5aaa0e05049256c4309bd27d81c2c002fc8643d0133a56a7a3a39a" image used for the service or a problem with the definition of the systemd service
The solution is to tell postgresql to use another directory for its unix socket. for that, login to the bootstrap node
Basically this is replacing the command "/bin/bash start_db.sh" run in the container by the content of the "start_db.sh" and adding a parameter to change the location of the socket to each "pg_ctl" command ("-k /tmp")
Then the installation continues and after a loong time (>1h30 for me) you can enjoy your freshly installed v4.17.0-0.okd-scos-2024-09-26-224828 cluster
The problem/workaround is the same for an assisted-installer install, and for installing v4.16.0-0.okd-scos-2024-09-27-110344
Beta Was this translation helpful? Give feedback.
All reactions