chore(ci): fix error handling & add timeout #3835

zdrapela · 2025-12-11T13:31:04Z

Description

This PR fixes CI/CD pipeline error handling issues that were causing:

Scripts to exit prematurely when Playwright tests failed
Cleanup function to run multiple times
kubectl logs commands to hang indefinitely when pods were unresponsive

See this log, where the issues happened: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/redhat-developer_rhdh/3830/pull-ci-redhat-developer-rhdh-main-e2e-ocp-helm/1998787878628364288/artifacts/e2e-ocp-helm/redhat-developer-rhdh-ocp-helm/build-log.txt

Root Cause Analysis

The CI script was configured with set -o errexit and trap cleanup EXIT INT ERR. When combined with pipefail (enabled by configure_external_postgres_db()), this caused:

Pipeline failures propagating: yarn playwright test | tee would fail the entire script when tests failed (due to pipefail)
Double cleanup execution: Both ERR and EXIT traps fired on failures
Hanging log collection: kubectl logs had no timeout, causing 40+ minute hangs when pods were stuck

Changes

`openshift-ci-tests.sh`

Simplified trap to EXIT only (removes INT and ERR)
EXIT trap fires exactly once on any termination, preventing duplicate cleanup

`utils.sh`

retrieve_pod_logs(): Added 30-second timeout to kubectl logs commands to prevent hanging
configure_external_postgres_db(): Removed unnecessary set -euo pipefail that was leaking globally

Expected Behavior

Scenario	Before	After
Playwright tests fail	Script exits immediately	Script continues, records failure
`kubectl logs` hangs	Waits indefinitely (40+ min)	Times out after 30 seconds
Cleanup on error	Runs 2-3 times	Runs exactly once

Which issue(s) does this PR fix

Fixes intermittent CI failures where scripts would hang or exit prematurely
Addresses build log issue

PR acceptance criteria

GitHub Actions are completed and successful
Unit Tests are updated and passing
E2E Tests are updated and passing
Documentation is updated if necessary (requirement for new features)
Add a screenshot if the change is UX/UI related

How to test changes / Special notes to the reviewer

Run e2e tests and verify script continues even when some tests fail
Verify cleanup runs only once in logs
Verify no 40+ minute hangs on log collection

Sometimes the log collection is stuck

gustavolira · 2025-12-11T13:43:35Z

/approve
/lgtm

sonarqubecloud · 2025-12-11T13:53:28Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

gustavolira · 2025-12-11T13:58:00Z

/approve
/lgtm

openshift-ci · 2025-12-11T13:58:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gustavolira

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~.claude/OWNERS~~ [gustavolira]
~~.cursor/OWNERS~~ [gustavolira]
~~.ibm/OWNERS~~ [gustavolira]
~~.rulesync/OWNERS~~ [gustavolira]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

github-actions · 2025-12-11T14:47:01Z

The image is available at:

zdrapela · 2025-12-11T15:19:52Z

/test e2e-ocp-helm

zdrapela · 2025-12-11T15:24:58Z

failed on a flaky test: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/redhat-developer_rhdh/3835/pull-ci-redhat-developer-rhdh-main-e2e-ocp-helm/1999115193510006784

zdrapela added 3 commits December 11, 2025 14:28

chore(ci): fix error handling

ea1f005

add timeout for log collection

7325598

Sometimes the log collection is stuck

Fix cleanup to run only once at the exit

4b0c11e

openshift-ci bot requested review from gustavolira and subhashkhileri December 11, 2025 13:31

zdrapela changed the title ~~chore(ci): fix error handling~~ chore(ci): fix error handling & add timeout Dec 11, 2025

zdrapela temporarily deployed to internal December 11, 2025 13:31 — with GitHub Actions Inactive

Retest

0269b72

zdrapela temporarily deployed to internal December 11, 2025 13:35 — with GitHub Actions Inactive

Retest

093b725

zdrapela temporarily deployed to internal December 11, 2025 13:40 — with GitHub Actions Inactive

openshift-ci bot assigned gustavolira Dec 11, 2025

openshift-ci bot added lgtm approved labels Dec 11, 2025

Add AI rule for set

585829b

openshift-ci bot removed the lgtm label Dec 11, 2025

zdrapela temporarily deployed to internal December 11, 2025 13:53 — with GitHub Actions Inactive

openshift-ci bot added the lgtm label Dec 11, 2025

openshift-merge-bot bot merged commit 8db5ff1 into redhat-developer:main Dec 11, 2025
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(ci): fix error handling & add timeout #3835

chore(ci): fix error handling & add timeout #3835

Uh oh!

zdrapela commented Dec 11, 2025 •

edited

Loading

Uh oh!

gustavolira commented Dec 11, 2025

Uh oh!

sonarqubecloud bot commented Dec 11, 2025

Uh oh!

gustavolira commented Dec 11, 2025

Uh oh!

openshift-ci bot commented Dec 11, 2025

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

zdrapela commented Dec 11, 2025

Uh oh!

zdrapela commented Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chore(ci): fix error handling & add timeout #3835

chore(ci): fix error handling & add timeout #3835

Uh oh!

Conversation

zdrapela commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Root Cause Analysis

Changes

openshift-ci-tests.sh

utils.sh

Expected Behavior

Which issue(s) does this PR fix

PR acceptance criteria

How to test changes / Special notes to the reviewer

Uh oh!

gustavolira commented Dec 11, 2025

Uh oh!

sonarqubecloud bot commented Dec 11, 2025

Quality Gate passed

Uh oh!

gustavolira commented Dec 11, 2025

Uh oh!

openshift-ci bot commented Dec 11, 2025

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

zdrapela commented Dec 11, 2025

Uh oh!

zdrapela commented Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zdrapela commented Dec 11, 2025 •

edited

Loading

`openshift-ci-tests.sh`

`utils.sh`