-
Notifications
You must be signed in to change notification settings - Fork 30
fix(kubeovn-operator): use helm post hook to ensure default config #455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a race condition between webhook readiness and Configuration resource creation during Helm chart installation by converting the Configuration custom resource to a ConfigMap and using a Helm post-hook Job to apply it after installation completes.
Key changes:
- Introduces a post-install/post-upgrade Job that applies the Configuration from a ConfigMap after the webhook is ready
- Converts the Configuration resource template to a ConfigMap containing the Configuration YAML
- Adds kubectl hook image configuration to values.yaml for running the post-hook Job
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| charts/kubeovn-operator/values.yaml | Adds hookImage configuration for the kubectl container used in the post-hook Job |
| charts/kubeovn-operator/templates/post-job.yaml | New Helm post-hook Job that applies the Configuration resource from the ConfigMap after installation/upgrade |
| charts/kubeovn-operator/templates/configuration.yaml | Transforms the Configuration custom resource into a ConfigMap that stores the Configuration YAML as data |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pingerExternalDomain: {{ .Values.configurationSpec.ipv4.pingerExternalDomain | default "google.com" }} | ||
| podCIDR: {{ .Values.configurationSpec.ipv4.podCIDR | default "10.52.0.0/16" }} | ||
| podGateway: {{ .Values.configurationSpec.ipv4.podGateway | default "10.52.0.1" }} | ||
| serviceCIDR: {{ .Values.configurationSpec.ipv4.podGateway | default "10.53.0.1" }} |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The serviceCIDR field is incorrectly referencing '.Values.configurationSpec.ipv4.podGateway' instead of '.Values.configurationSpec.ipv4.serviceCIDR'. This will cause the serviceCIDR to be set to the pod gateway value (10.52.0.1) instead of the intended service CIDR value (10.53.0.1), which could lead to network routing issues.
| serviceCIDR: {{ .Values.configurationSpec.ipv4.podGateway | default "10.53.0.1" }} | |
| serviceCIDR: {{ .Values.configurationSpec.ipv4.serviceCIDR | default "10.53.0.1" }} |
| - -f | ||
| - /opt/kubeovn-operator/config.yaml | ||
| securityContext: | ||
| allowPrivilegeEscalation: false |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Job template is missing readOnlyRootFilesystem security setting. For enhanced security in a restricted environment, consider adding 'readOnlyRootFilesystem: true' to the securityContext to prevent writes to the container's filesystem.
| allowPrivilegeEscalation: false | |
| allowPrivilegeEscalation: false | |
| readOnlyRootFilesystem: true |
| "helm.sh/hook": post-install,post-upgrade | ||
| "helm.sh/hook-weight": "5" | ||
| "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded | ||
| spec: |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Job is missing a backoffLimit specification. Without this, the Job will use the default backoffLimit of 6, which may lead to multiple retry attempts if the Configuration creation fails. Consider adding 'backoffLimit: 0' or a low value to the Job spec to fail fast if there's an issue with applying the configuration, making debugging easier.
| spec: | |
| spec: | |
| backoffLimit: 0 |
| spec: | ||
| template: | ||
| metadata: | ||
| namespace: {{ .Release.Namespace }} |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pod template metadata includes a 'namespace' field, which is not a valid field for pod templates within Job specs. The namespace is inherited from the Job itself and shouldn't be specified in spec.template.metadata. This field should be removed as it may cause validation errors or be ignored by Kubernetes.
| namespace: {{ .Release.Namespace }} |
|
I got the this log message in |
Did the job succeed eventually? It will undoubtedly encounter the error, but hopefully it can finish thanks to the job retry mechanism. |
Yes. It ended up successful installation. Okay, so it seems fine with this case. Thanks! |
| "helm.sh/hook": post-install,post-upgrade | ||
| "helm.sh/hook-weight": "5" | ||
| "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded | ||
| spec: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of relying on a post-install job, can the operator mount the config map and then create the embedded configuration if it's missing? Then the config map can remain part of the normal install/upgrade path. If desired, it also gives the operator the ability to revert any drift found in the CR back to the original found in the config map.
See relevant src here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @ihcsim. I like the idea. But given the tight schedule for v1.7.1, we may want a more straightforward fix here. After discussing with @ibrokethecloud, it turns out the default Configuration CR does not have to be processed by the mutating webhook. A simpler option than the one I proposed earlier (the post-install job) is to make MutatingWebhookConfiguration a post-install-hook resource instead of a pre-install-hook resource. WDYT?
Signed-off-by: Zespre Chang <[email protected]>
Yu-Jack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It works.
…arvester#455) Signed-off-by: Zespre Chang <[email protected]>
…455) Signed-off-by: Zespre Chang <[email protected]>
Problem:
A race condition were found between the webhook readiness and the Configuration creation upon chart installation.
Solution:
Moving the default Configuration CR to a ConfigMap and using a post-hook Job to apply the default Configuration from the ConfigMap volume.Since the default Configuration CR does not need to be processed by the mutating webhook (we've already crafted it with sane defaults suitable for Harvester-use), we can make the MutatingWebhookConfiguration resource a post-install-hook. That way, the default Configuration CR can bypass the webhook upon creation.
Related Issue(s):
harvester/harvester#9786
Test plan:
Additional documentation or context
Previously, installing the chart directly from the master branch ends up with the following due to the said reason: