Skip to content

Conversation

CryptoRodeo
Copy link
Contributor

Add the KITE bridge operator manifests so it can be deployed.

Add the KITE bridge operator manifests so it can be deployed.

Signed-off-by: Bryan Ramos <[email protected]>
Copy link

openshift-ci bot commented Sep 5, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link

openshift-ci bot commented Sep 5, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CryptoRodeo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Sep 5, 2025
Copy link
Contributor

github-actions bot commented Sep 5, 2025

Code Review by Gemini

## Code Review

The pull request introduces changes primarily related to adding KITE bridge operator manifests, as stated in the commit title. However, it also includes several unrelated changes and a critical issue with an image tag.

### Issues and Suggestions:

1.  **Unrelated Changes (Major Issue)**
    The pull request contains significant changes that are unrelated to the stated purpose of adding KITE bridge operator manifests. This makes the review process difficult, obscures the intent of the changes, and complicates potential rollbacks. It is highly recommended to split these changes into separate pull requests.

    *   **`components/konflux-info/production/.../banner-content.yaml`**: Adding maintenance banners.
    *   **`components/kueue/production/kflux-rhel-p01/queue-config/cluster-queue.yaml`**: Modifying Kueue cluster queue resources and quotas.
    *   **`components/multi-platform-controller/production-downstream/kflux-rhel-p01/host-config.yaml`**: Removing and adding dynamic/static hosts for the multi-platform controller.
    *   **`components/smee-client/staging/gosmee-v0.28.0-patch.yaml` and `components/smee-client/staging/kustomization.yaml`**: Deleting a patch for `gosmee-client`.
    *   **`components/smee/staging/stone-stg-host/gosmee-v0.28.0-patch.yaml` and `components/smee/staging/stone-stg-host/kustomization.yaml`**: Deleting a patch for `gosmee`.

    **Suggestion:** Revert all changes not directly related to the KITE bridge operator and submit them in separate, appropriately titled pull requests.

2.  **Hardcoded `:latest` Image Tag (Critical Issue)**
    In `components/konflux-kite/base/operator/manager.yaml`, the `kite-operator` image is specified with the `:latest` tag. Using `:latest` tags in deployments is highly discouraged as it can lead to unpredictable behavior and makes it difficult to track which version of the image is actually running. The `kustomization.yaml` already defines a specific `newTag` for `kite-operator`.

    **File: `components/konflux-kite/base/operator/manager.yaml`**
    ```diff
    --- a/components/konflux-kite/base/operator/manager.yaml
    +++ b/components/konflux-kite/base/operator/manager.yaml
    @@ -22,7 +22,7 @@
         # Use service discovery via DNS to automatically resolve within the cluster
         - name: KITE_API_URL
           value: "http://konflux-kite.konflux-kite.svc.cluster.local"
    -    image: quay.io/konflux-ci/kite-operator:latest
    +    image: quay.io/konflux-ci/kite-operator:b93ea4b
         livenessProbe:
           httpGet:
             path: /healthz
    ```
    **Suggestion:** Replace `:latest` with the specific tag `b93ea4b` (or the full commit hash if preferred for better traceability) as defined in `components/konflux-kite/base/kustomization.yaml`.

3.  **Maintenance Banner Date (Minor/Clarification)**
    The maintenance banners added in `components/konflux-info/production/.../banner-content.yaml` are set for the year `2025`. While not a bug, it's unusual to set a production maintenance banner so far in advance. If this is a placeholder or a long-term planned event, it should be clarified. If the maintenance is intended to be sooner, the date should be adjusted.

    **File: `components/konflux-info/production/kflux-prd-rh03/banner-content.yaml`**
    ```diff
    --- a/components/konflux-info/production/kflux-prd-rh03/banner-content.yaml
    +++ b/components/konflux-info/production/kflux-prd-rh03/banner-content.yaml
    @@ -1,2 +1,8 @@
     # Only the first banner will be displayed. Put the one to display at the top and remove any which are no longer relevant
    -[]
    +- type: info
    +  summary: "🚨 This cluster is currently undergoing maintenance until 15:00 UTC and will remain usable during this mainenance period. 🚨"
    +  startTime: "13:00"
    +  endTime: "15:00"
    +  year: 2025
    +  month: 9
    +  dayOfMonth: 4
    ```
    **Suggestion:** If these banners are not for immediate use, consider removing them from this PR and adding them closer to the actual maintenance date, or clarify their purpose in the commit message if they are indeed long-term placeholders. (Preferably, remove them as per Issue 1).

### General Observations:

*   **Version Bumps:** The various version bumps for `build-service`, `image-controller`, `integration-service`, and `release-service` appear to be routine updates. Assuming these are intended, they don't present issues on their own, but they also contribute to the "unrelated changes" problem if not part of a broader, coordinated update.
*   **KITE Bridge Operator Manifests:** The new files for the KITE bridge operator (`operator/service-account.yaml`, `operator/role.yaml`, `operator/role-binding.yaml`, `operator/cluster-role.yaml`, `operator/cluster-role-binding.yaml`, `operator/manager.yaml`) and the updates to `components/konflux-kite/base/kustomization.yaml` seem correctly structured for deploying a Kubernetes operator. The resource definitions and RBAC roles appear standard for an operator interacting with `tekton.dev/pipelineruns`.

@gbenhaim
Copy link
Member

gbenhaim commented Sep 5, 2025

Why do we need an additional service for storing piplinerun status? We have kubeaechive or tetkon results. Why can't we use them?

@CryptoRodeo
Copy link
Contributor Author

CryptoRodeo commented Sep 5, 2025

Hi @gbenhaim , the main difference is that this POC issue tracking system is built for active issue management related to resources on the cluster rather than just storage. KubeArchive and Tekton Results are great for storing historical data, but they don't support updating existing records AFAIK.

How this POC issue tracking system should work:
This POC bridge operator will have controllers that watch for resource failures or successes in real-time and communicates with a backend API to automatically manage issue records (create,update,resolve,etc). By storing issue records in an external database this prevents overloading the cluster's main data store (etcd).

Why use a bridge operator?
Using a bridge operator that can be extended by adding controllers gives teams have a standard way to monitor their resources on the cluster to auto create/update/resolve issue records while also giving them the flexibility to alter the logic as needed.

What I'm currently testing
I want to test watching a resource on the cluster and auto create/update/resolve issue records. I'm using pipelineruns as the resource for this test.

I'm also working with MintMaker and the Konflux Build team so they can create controllers to watch their resources as well.

What will this be used for?
The bigger picture is that this will power the issues dashboard in Konflux, with teams contributing controllers that watch various resources and automatically managing their issue records.

Would another solution make more sense?
If the issues dashboard only had to show failed pipelineruns then yes, another solution would make more sense. But I was told we might be dealing with other resource types, so I kept the implementation flexible and generic for different resource issues.

If I'm wrong here, please let me know.

I have a short demo here if you want to see this in action: https://drive.google.com/file/d/1WLiU5LVvrBgGnUeZV8tW1sebugJHGPJK/view

If you need more information, here is the repository for this project: https://github.com/konflux-ci/kite

Signed-off-by: Bryan Ramos <[email protected]>
Copy link
Contributor

github-actions bot commented Sep 5, 2025

Code Review by Gemini

--- a/components/konflux-kite/base/kustomization.yaml
+++ b/components/konflux-kite/base/kustomization.yaml
@@ -20,6 +20,7 @@
     newTag: b25314f9c17124eed9dd1a1f7f243e6f88071010
 
   - name: quay.io/konflux-ci/kite-operator
     newTag: b93ea4b
 
 commonLabels:
   app.kubernetes.io/name: konflux-kite

Suggestion:
The newTag for quay.io/konflux-ci/kite-operator is defined here as b93ea4b. However, in components/konflux-kite/base/operator/manager.yaml, the image for the operator deployment uses :latest. It's best practice to use a specific, immutable tag for deployments to ensure consistency and prevent unexpected updates.

--- a/components/konflux-kite/base/operator/manager.yaml
+++ b/components/konflux-kite/base/operator/manager.yaml
@@ -20,7 +20,7 @@
         # Use service discovery via DNS to automatically resolve within the cluster
         - name: KITE_API_URL
           value: "http://konflux-kite"
-        image: quay.io/konflux-ci/kite-operator:latest
+        image: quay.io/konflux-ci/kite-operator:b93ea4b
         livenessProbe:
           httpGet:
             path: /healthz

Suggestion:
Replace :latest with the specific tag b93ea4b to match the newTag defined in components/konflux-kite/base/kustomization.yaml. This ensures the deployed image version is explicit and stable.

@gbenhaim
Copy link
Member

gbenhaim commented Sep 6, 2025

I'm also working with MintMaker and the Konflux Build team so they can create controllers to watch their resources as well

Which resources?

KubeArchive and Tekton Results are great for storing historical data, but they don't support updating existing records AFAIK.

What is the flow? When are you going to update records. Let's say that a pipeline run failed and your operator created an issue for it. What's next?

The bigger picture is that this will power the issues dashboard in Konflux, with teams contributing controllers that watch various resources and automatically managing their issue records

Is there an ADR fot this plan? If not, I suggest to create one before proceeding with the deployment.
.

@CryptoRodeo
Copy link
Contributor Author

@gbenhaim I have an ADR you can view here: konflux-ci/architecture#255

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants