fix(kubeovn-operator): use helm post hook to ensure default config #455

starbops · 2025-12-31T09:06:25Z

Problem:

A race condition were found between the webhook readiness and the Configuration creation upon chart installation.

Solution:

~~Moving the default Configuration CR to a ConfigMap and using a post-hook Job to apply the default Configuration from the ConfigMap volume.~~

Since the default Configuration CR does not need to be processed by the mutating webhook (we've already crafted it with sane defaults suitable for Harvester-use), we can make the MutatingWebhookConfiguration resource a post-install-hook. That way, the default Configuration CR can bypass the webhook upon creation.

Related Issue(s):

harvester/harvester#9786

Test plan:

Prepare a v1.7.0 Harvester cluster
Clone the harvester/charts repository, checkout to the branch containing the fix, and install the kubeovn-operator chart directly from the local directory:
```
helm upgrade --install kubeovn-operator ./charts/kubeovn-operator --namespace=kube-system --create-namespace --set=enableConfiguration=true
```
The Helm chart installation should success

Additional documentation or context

Previously, installing the chart directly from the master branch ends up with the following due to the said reason:

$ helm upgrade --install kubeovn-operator ./charts/kubeovn-operator --namespace=kube-system --create-namespace --set=enableConfiguration=true
Release "kubeovn-operator" does not exist. Installing it now.
Error: Internal error occurred: failed calling webhook "mconfiguration-v1.kb.io": failed to call webhook: Post "https://kubeovn-operator-webhook-service.kube-system.svc:443/mutate-kubeovn-io-v1-configuration?timeout=10s": no endpoints available for service "kubeovn-operator-webhook-service"

Copilot

Pull request overview

This PR fixes a race condition between webhook readiness and Configuration resource creation during Helm chart installation by converting the Configuration custom resource to a ConfigMap and using a Helm post-hook Job to apply it after installation completes.

Key changes:

Introduces a post-install/post-upgrade Job that applies the Configuration from a ConfigMap after the webhook is ready
Converts the Configuration resource template to a ConfigMap containing the Configuration YAML
Adds kubectl hook image configuration to values.yaml for running the post-hook Job

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
charts/kubeovn-operator/values.yaml	Adds hookImage configuration for the kubectl container used in the post-hook Job
charts/kubeovn-operator/templates/post-job.yaml	New Helm post-hook Job that applies the Configuration resource from the ConfigMap after installation/upgrade
charts/kubeovn-operator/templates/configuration.yaml	Transforms the Configuration custom resource into a ConfigMap that stores the Configuration YAML as data

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-31T09:17:42Z

charts/kubeovn-operator/templates/configuration.yaml

+        pingerExternalDomain: {{ .Values.configurationSpec.ipv4.pingerExternalDomain | default "google.com" }}
+        podCIDR: {{ .Values.configurationSpec.ipv4.podCIDR | default "10.52.0.0/16" }}
+        podGateway: {{ .Values.configurationSpec.ipv4.podGateway | default "10.52.0.1" }}
+        serviceCIDR: {{ .Values.configurationSpec.ipv4.podGateway | default "10.53.0.1" }}


The serviceCIDR field is incorrectly referencing '.Values.configurationSpec.ipv4.podGateway' instead of '.Values.configurationSpec.ipv4.serviceCIDR'. This will cause the serviceCIDR to be set to the pod gateway value (10.52.0.1) instead of the intended service CIDR value (10.53.0.1), which could lead to network routing issues.

Suggested change

serviceCIDR: {{ .Values.configurationSpec.ipv4.podGateway | default "10.53.0.1" }}

serviceCIDR: {{ .Values.configurationSpec.ipv4.serviceCIDR | default "10.53.0.1" }}

Copilot · 2025-12-31T09:17:42Z

charts/kubeovn-operator/templates/post-job.yaml

+        - -f
+        - /opt/kubeovn-operator/config.yaml
+        securityContext:
+          allowPrivilegeEscalation: false


The Job template is missing readOnlyRootFilesystem security setting. For enhanced security in a restricted environment, consider adding 'readOnlyRootFilesystem: true' to the securityContext to prevent writes to the container's filesystem.

Suggested change

allowPrivilegeEscalation: false

allowPrivilegeEscalation: false

readOnlyRootFilesystem: true

Copilot · 2025-12-31T09:17:43Z

charts/kubeovn-operator/templates/post-job.yaml

+    "helm.sh/hook": post-install,post-upgrade
+    "helm.sh/hook-weight": "5"
+    "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
+spec:


The Job is missing a backoffLimit specification. Without this, the Job will use the default backoffLimit of 6, which may lead to multiple retry attempts if the Configuration creation fails. Consider adding 'backoffLimit: 0' or a low value to the Job spec to fail fast if there's an issue with applying the configuration, making debugging easier.

Suggested change

spec:

spec:

backoffLimit: 0

Copilot · 2025-12-31T09:17:43Z

charts/kubeovn-operator/templates/post-job.yaml

+spec:
+  template:
+    metadata:
+      namespace: {{ .Release.Namespace }}


The pod template metadata includes a 'namespace' field, which is not a valid field for pod templates within Job specs. The namespace is inherited from the Job itself and shouldn't be specified in spec.template.metadata. This field should be removed as it may cause validation errors or be ignored by Kubernetes.

Suggested change

namespace: {{ .Release.Namespace }}

Yu-Jack · 2026-01-08T07:48:39Z

I got the this log message in kube-system/kubeovn-operator-xxxxx:post-job. Is that expected?

│ Error from server (InternalError): error when applying patch:                                                                                                                                                                      │
│ {"spec":{"ovnCentral":{"limits":{"cpu":3}},"ovsOVN":{"limits":{"cpu":2}}}}                                                                                                                                                         │
│ to:                                                                                                                                                                                                                                │
│ Resource: "kubeovn.io/v1, Resource=configurations", GroupVersionKind: "kubeovn.io/v1, Kind=Configuration"                                                                                                                          │
│ Name: "kubeovn", Namespace: "kube-system"                                                                                                                                                                                          │
│ for: "/opt/kubeovn-operator/config.yaml": error when patching "/opt/kubeovn-operator/config.yaml": Internal error occurred: failed calling webhook "mconfiguration-v1.kb.io": failed to call webhook: Post "https://kubeovn-operat │
│ or-webhook-service.kube-system.svc:443/mutate-kubeovn-io-v1-configuration?timeout=10s": no endpoints available for service "kubeovn-operator-webhook-service"                                                                      │
│ stream closed EOF for kube-system/kubeovn-operator-7k5m7 (post-job)

starbops · 2026-01-08T14:25:38Z

I got the this log message in kube-system/kubeovn-operator-xxxxx:post-job. Is that expected?

│ Error from server (InternalError): error when applying patch:                                                                                                                                                                      │
│ {"spec":{"ovnCentral":{"limits":{"cpu":3}},"ovsOVN":{"limits":{"cpu":2}}}}                                                                                                                                                         │
│ to:                                                                                                                                                                                                                                │
│ Resource: "kubeovn.io/v1, Resource=configurations", GroupVersionKind: "kubeovn.io/v1, Kind=Configuration"                                                                                                                          │
│ Name: "kubeovn", Namespace: "kube-system"                                                                                                                                                                                          │
│ for: "/opt/kubeovn-operator/config.yaml": error when patching "/opt/kubeovn-operator/config.yaml": Internal error occurred: failed calling webhook "mconfiguration-v1.kb.io": failed to call webhook: Post "https://kubeovn-operat │
│ or-webhook-service.kube-system.svc:443/mutate-kubeovn-io-v1-configuration?timeout=10s": no endpoints available for service "kubeovn-operator-webhook-service"                                                                      │
│ stream closed EOF for kube-system/kubeovn-operator-7k5m7 (post-job)

Did the job succeed eventually? It will undoubtedly encounter the error, but hopefully it can finish thanks to the job retry mechanism.

Yu-Jack · 2026-01-09T03:12:56Z

Did the job succeed eventually? It will undoubtedly encounter the error, but hopefully it can finish thanks to the job retry mechanism.

Yes. It ended up successful installation. Okay, so it seems fine with this case. Thanks!

ihcsim · 2026-01-09T19:07:48Z

charts/kubeovn-operator/templates/post-job.yaml

+    "helm.sh/hook": post-install,post-upgrade
+    "helm.sh/hook-weight": "5"
+    "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
+spec:


Instead of relying on a post-install job, can the operator mount the config map and then create the embedded configuration if it's missing? Then the config map can remain part of the normal install/upgrade path. If desired, it also gives the operator the ability to revert any drift found in the CR back to the original found in the config map.

See relevant src here.

Thanks, @ihcsim. I like the idea. But given the tight schedule for v1.7.1, we may want a more straightforward fix here. After discussing with @ibrokethecloud, it turns out the default Configuration CR does not have to be processed by the mutating webhook. A simpler option than the one I proposed earlier (the post-install job) is to make MutatingWebhookConfiguration a post-install-hook resource instead of a pre-install-hook resource. WDYT?

Signed-off-by: Zespre Chang <[email protected]>

Yu-Jack

Thanks. It works.

…arvester#455) Signed-off-by: Zespre Chang <[email protected]>

…455) Signed-off-by: Zespre Chang <[email protected]>

starbops marked this pull request as ready for review December 31, 2025 09:15

starbops requested review from Yu-Jack, Copilot and ibrokethecloud December 31, 2025 09:15

Copilot started reviewing on behalf of starbops December 31, 2025 09:15 View session

Copilot AI reviewed Dec 31, 2025

View reviewed changes

starbops mentioned this pull request Dec 31, 2025

[BUG] Fail to enable the kubeovn-operator add-on in Harvester v1.7.0 harvester/harvester#9786

Open

starbops requested a review from rrajendran17 January 2, 2026 04:45

ihcsim reviewed Jan 9, 2026

View reviewed changes

fix(kubeovn-operator): bypass webhook for default config cr creation

28baedc

Signed-off-by: Zespre Chang <[email protected]>

starbops force-pushed the fix-9786 branch from 73221ac to 28baedc Compare January 12, 2026 04:34

starbops requested a review from ihcsim January 12, 2026 04:34

ibrokethecloud approved these changes Jan 12, 2026

View reviewed changes

ihcsim approved these changes Jan 12, 2026

View reviewed changes

Yu-Jack approved these changes Jan 13, 2026

View reviewed changes

starbops merged commit f1df6bf into harvester:master Jan 13, 2026
5 checks passed

starbops added a commit to starbops/harvester-charts that referenced this pull request Jan 14, 2026

fix(kubeovn-operator): bypass webhook for default config cr creation (h…

fe4657f

…arvester#455) Signed-off-by: Zespre Chang <[email protected]>

starbops mentioned this pull request Jan 14, 2026

[backport v1.7] [BUG] Fail to enable the kubeovn-operator add-on in Harvester v1.7.0 harvester/harvester#9787

Open

starbops added a commit that referenced this pull request Jan 16, 2026

fix(kubeovn-operator): bypass webhook for default config cr creation (#…

d91fec4

…455) Signed-off-by: Zespre Chang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(kubeovn-operator): use helm post hook to ensure default config #455

fix(kubeovn-operator): use helm post hook to ensure default config #455

Uh oh!

starbops commented Dec 31, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 31, 2025

Uh oh!

Copilot AI Dec 31, 2025

Uh oh!

Copilot AI Dec 31, 2025

Uh oh!

Copilot AI Dec 31, 2025

Uh oh!

Yu-Jack commented Jan 8, 2026 •

edited

Loading

Uh oh!

starbops commented Jan 8, 2026

Uh oh!

Yu-Jack commented Jan 9, 2026

Uh oh!

ihcsim Jan 9, 2026

Uh oh!

starbops Jan 12, 2026

Uh oh!

Yu-Jack left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	serviceCIDR: {{ .Values.configurationSpec.ipv4.podGateway \| default "10.53.0.1" }}
	serviceCIDR: {{ .Values.configurationSpec.ipv4.serviceCIDR \| default "10.53.0.1" }}

	allowPrivilegeEscalation: false
	allowPrivilegeEscalation: false
	readOnlyRootFilesystem: true

fix(kubeovn-operator): use helm post hook to ensure default config #455

fix(kubeovn-operator): use helm post hook to ensure default config #455

Uh oh!

Conversation

starbops commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem:

Solution:

Related Issue(s):

Test plan:

Additional documentation or context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Yu-Jack commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

starbops commented Jan 8, 2026

Uh oh!

Yu-Jack commented Jan 9, 2026

Uh oh!

ihcsim Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

starbops Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Yu-Jack left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

starbops commented Dec 31, 2025 •

edited

Loading

Yu-Jack commented Jan 8, 2026 •

edited

Loading