ABI UPI baremetal install problem and workaround for the latest (2024-09-26/27) version (v4.17, v4.16) of OKD-SCOS #2035

titou10titou10 · 2024-09-28T17:14:25Z

titou10titou10
Sep 28, 2024

I was able to install a v4.17.0-0.okd-scos-2024-09-26-224828 cluster with the ABI with UPI/baremetal

[UPDATE]
This procedure works also with the current latest version of OKD: v4.17.0-okd-scos.ec.4

First, none of the FCOS image work as bootstrap images, including the one embedded with the latest releases (FCOS v39.20231101.3.0), this is addressed here #2019 (reply in thread) ("We are currently discussing a plan for that")
This includes the latest stable FCOS versions as of today (2024-09-28): v40.20240906.3.0
The problem is described here: #2015

So to install with the ABI, it is necessary to use another image for bootstrapping. This is done by overriding the bootstrap image by a rhcos image. Use the " OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE" environment variable for this before building the image

export OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE=https://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/pre-release/4.17.0-ec.3/rhcos-4.17.0-ec.3-x86_64-live.x86_64.iso
oc adm release extract --command=openshift-install quay.io/okd/scos-release:4.17.0-0.okd-scos-2024-09-26-224828
#  prepare the install directory with agent-config.yaml and install-config.yaml...
./openshift-install agent create image --dir install --log-level=debug

Then boot the nodes.
On the rendez-vous/bs node, the "assisted-installer-db" service fails/loop forever. Its role is to startup a postgresql database, but it fails, the reason is

FATAL:  could not create lock file "/var/run/postgresql/.s.PGSQL.5432.lock": No such file or directory"

The "/var/run/postgresql/" directory for the postgresql database to use for unix socket does not exist. This is most likely a bug in the "quay.io/okd/scos-content@sha256:47640b92fc5aaa0e05049256c4309bd27d81c2c002fc8643d0133a56a7a3a39a" image used for the service or a problem with the definition of the systemd service

The solution is to tell postgresql to use another directory for its unix socket. for that, login to the bootstrap node

ssh core@<bootstrap/rendez-vous node>
sudo su -

vi /etc/systemd/system/assisted-service-db.service
>  replace the ExecStart= line by this one
ExecStart=/usr/bin/podman run --net host --user=postgres --cidfile=%t/%n.ctr-id --cgroups=no-conmon --log-driver=journald --rm --pod-id-file=%t/assisted-service-pod.pod-id --sdnotify=conmon --replace -d --name=assisted-db --env-file=/usr/local/share/assisted-service/assisted-db.env $SERVICE_IMAGE /bin/bash -c '/usr/bin/pg_ctl -D /tmp/postgres/data/ -l /tmp/postgres/logfile start -w -o "-k /tmp"; createuser -s admin -h localhost; createdb installer -h localhost; /usr/bin/pg_ctl -D /tmp/postgres/data/ -l /tmp/postgres/logfile stop -w -o "-k /tmp"; exec postgres -D /tmp/postgres/data/ -k /tmp'
> save, exit :wq

systemctl daemon-reload
systemctl restart assisted-service-db

Basically this is replacing the command "/bin/bash start_db.sh" run in the container by the content of the "start_db.sh" and adding a parameter to change the location of the socket to each "pg_ctl" command ("-k /tmp")

Then the installation continues and after a loong time (>1h30 for me) you can enjoy your freshly installed v4.17.0-0.okd-scos-2024-09-26-224828 cluster

The problem/workaround is the same for an assisted-installer install, and for installing v4.16.0-0.okd-scos-2024-09-27-110344

dkoci · 2024-09-29T11:53:01Z

dkoci
Sep 29, 2024

Would the above also work for BM UPI (without ABI)?

1 reply

titou10titou10 Sep 29, 2024
Author

Would the above also work for BM UPI (without ABI)?

No idea. I guess you'll not encounter the problem with the agent-installer-db service as it is a component of the agent installer also used by the assisted installer. I do not know if this component is used underneath by the "classic" installation way (ie iPXE+dedicated bootstratp node...)

ArthurVardevanyan · 2024-10-09T15:51:27Z

ArthurVardevanyan
Oct 9, 2024

Thank you!
I was able to follow along and get this to work as well.

Image: https://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/4.17/4.17.0/rhcos-4.17.0-x86_64-live.x86_64.iso
Version: 4.17.0-0.okd-scos-2024-09-30-101739

1 reply

ArthurVardevanyan Nov 15, 2024

Same Behavior with 4.18 Builds:

export OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE=https://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/pre-release/4.18.0-ec.3/rhcos-4.18.0-ec.3-x86_64-live.x86_64.iso
export OKD_VERSION="4.18.0-0.okd-scos-2024-11-15-020955"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ABI UPI baremetal install problem and workaround for the latest (2024-09-26/27) version (v4.17, v4.16) of OKD-SCOS #2035

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

ABI UPI baremetal install problem and workaround for the latest (2024-09-26/27) version (v4.17, v4.16) of OKD-SCOS #2035

titou10titou10 Sep 28, 2024

Replies: 2 comments · 2 replies

dkoci Sep 29, 2024

titou10titou10 Sep 29, 2024 Author

ArthurVardevanyan Oct 9, 2024

ArthurVardevanyan Nov 15, 2024

titou10titou10
Sep 28, 2024

Replies: 2 comments 2 replies

dkoci
Sep 29, 2024

titou10titou10 Sep 29, 2024
Author

ArthurVardevanyan
Oct 9, 2024