-
Notifications
You must be signed in to change notification settings - Fork 58
Daemonset deployment mode #825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
Hi @Pacho20. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Patrik for working on this new feature.
I added a couple of inline comments about the logic.
Please move the shell scripts to a separate file and include them, or mount them as an immutable ConfigMap.
Nice to have: deduplicate code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Pacho20, thanks for this. Appreciated.
I left a few comments for you, but my first suggestion here, is to squash the commits that are refactoring and changing the same code base, so the commit-by-commit review would be easier.
c5bacd6
to
a1108ca
Compare
451ec60
to
f0973fa
Compare
f0973fa
to
83c834a
Compare
83c834a
to
121a2db
Compare
/ok-to-test |
528a8e9
to
e48f315
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Pacho20, thanks for keeping pushing this PR. Appreciated.
I left a few more comments for you, but also lets see if others join the party.
Privileged: &runPrivileged, | ||
RunAsUser: &runAsUser, | ||
}, | ||
Command: []string{"/bin/bash", "/scripts/osc-configs-script.sh"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
controllers/daemonset_reconcile.go
Outdated
// | ||
// It returns false if no ConfigMap was created due to an error or | ||
// if the existing ConfigMap is identical to the provided YAML data. | ||
func (r *KataConfigOpenShiftReconciler) addImmutableConfigMap(path, yaml string) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a benefit of using the script from the configmap vs adding the script in the container image and run it through the usual static checks etc ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had suggested this from experience that yaml inside yaml can often have issues in ocp. The line wraps start to be interpreted by ocp when they should be running it verbatim inside the container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closing this as it was removed from this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong, thread, reopening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Pacho20 !
Thanks for submitting this. I could only partially review the first commit so far, I'll try to devote some time everyday to make progress. Anyway, I've already some comments to share. PTAL.
return false, err | ||
} | ||
|
||
return true, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MachineConfig CRD is also present in this case but it won't print the message... :-\
What about simplifying the return branches ? Something like :
err := r.Client.Get(context.Background(), client.ObjectKey{Name: "foo"}, mc)
if err == nil || k8serrors.IsNotFound(err) {
r.Log.Info("MachineConfig CRD is present")
return true, nil
}
if meta.IsNoMatchError(err) {
r.Log.Info("MachineConfig CRD not found")
return false, nil
}
return false, err
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gkurz we think we resolved this. Can you confirm?
} | ||
|
||
if mode == DaemonSetFallbackOption && !mcAvailable { | ||
r.Log.Info("MC is not available, deployment mode will be set to DaemonSet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r.Log.Info("MC is not available, deployment mode will be set to DaemonSet") | |
r.Log.Info("MachineConfig is not available, deployment mode will be set to DaemonSet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gkurz this should now be resolved. Can you confirm?
controllers/daemonset_reconcile.go
Outdated
return ctrl.Result{}, updateErr | ||
} | ||
return res, err | ||
res, err := r.processKataConfigDeleteRequestDaemonSet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space damage. Be sure to run make fmt
for each individual commit. It is also a good practice to run make vet
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think was fixed when it moved to here #950 the commit doesn't exist that has this error now.
} | ||
|
||
cfgMap := &corev1.ConfigMap{} | ||
err := r.Client.Get(context.TODO(), types.NamespacedName{Name: FgConfigMapName, | ||
Namespace: OperatorNamespace}, cfgMap) | ||
if err == nil { | ||
for feature, value := range cfgMap.Data { | ||
fgStatus.FeatureGates[feature] = value == "true" | ||
if value, ok := cfgMap.Data[ConfidentialFeatureGate]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the FeatureGateStatus
from containing a map to containing explicit values is what forces you to copy-paste the same code again and again here. This scales OK with three fields, but if we add more, that function will need to be revisited every time. Do we want that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@c3d, right, we asked him to add DeploymentModeOption over Slack to cover what the code was already doing and to support the "force mode". Since we don’t have just bools anymore, the extra code here is expected.
We’re also moving from feature gates to config options. I don’t love the duplication either, but for now I’d stick with this. If we add more gates later, we can always revisit with a table-driven/helper pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will close this thread, feel free to reopen if we came up with a better idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such a change should really happen in a preceding commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created a new commit. Could you check if it's acceptable for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Pacho20, thanks again for this work.
The comments shows the importance of your work, This is a big feature and you’re making solid progress. Keep going.
// The DaemonSet's Pod contains three containers: one installs the binaries, one modifies the node labels, and one sets the runtime log level. | ||
func (r *KataConfigOpenShiftReconciler) daemonSetForKataInstall(action KataDsAction) (*appsv1.DaemonSet, error) { | ||
// This image contains the binaries for kata-containers | ||
extensionImageString, err := r.GetExtensionImage() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should stay aligned with how we deploy across the project to avoid mixing methods. Since there’s already an effort to move the crio/kata config into the kata RPM, this DS could follow the same path, otherwise it risks becoming a separate “thing” that’s harder to maintain and ship in future versions. I’d also avoid introducing extra config maps, even if used only internally, since they tend to add confusion.
Now, about the scripts, we already have scripts/install-helpers/
, so the needed files could go there, and the image could consume them at build time as other images do.
If we have time to deploy this image in our pipeline, that’s the option I’d vote for (unless we have a better alternative).
@littlejawa, how hard do you think it would be to add this new component/image in Konflux? From what I see, hermetic builds shouldn’t be a problem, but do you see any other enterprise contract breaking here?
7d0f492
to
85c2055
Compare
return nil | ||
} | ||
|
||
func (r *KataConfigOpenShiftReconciler) GetExtensionImage() (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just missed the reason we have GetExentionImage()
and GetCliImage()
methods. Can't we avoid both methods and calling directly the GetImageForComponent
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need the GetCliImage()
function. I initially created both functions to allow for additional logic if needed. As you can see, GetExtensionImage()
has a TODO
comment because I wasn't sure whether this is the only extension image we need. Other extension images exist for different distros. If RHEL and CoreOS are sufficient, I can remove both functions and call GetImageForComponent()
directly.
case MachineConfigMode: | ||
res, err = r.processKataConfigDeleteRequest() | ||
case DaemonSetMode: | ||
res, err = r.processKataConfigDeleteRequestDaemonSet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the DaemonSetFallback
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the FG is set to DaemonSetFallback, this field is automatically set (L87 deployment_mode_handler.go) to either MachineConfigMode or DaemonSetMode depending on what is there. What's the suggestion?
case MachineConfigMode: | ||
res, err = r.processKataConfigInstallRequest() | ||
case DaemonSetMode: | ||
res, err = r.processKataConfigInstallRequestDaemonSet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, what about the DaemonSetFallback
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the FG is set to DaemonSetFallback, this field is automatically set (L87 deployment_mode_handler.go) to either MachineConfigMode or DaemonSetMode depending on what is there. What's the suggestion?
r.Log.Info("Kata PODs are present. Requeue for reconciliation ") | ||
return ctrl.Result{Requeue: true, RequeueAfter: 15 * time.Second}, err | ||
} | ||
res, err := r.checkDeletionEligibility() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice move!
err = r.deletePeedPodsConfigDaemonSet() | ||
if err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the fallback mode here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above. This is handled in deployment_mode_handler.go
85c2055
to
cb2cb22
Compare
|
||
// Attempt to GET any MachineConfig to verify if the GVK is known. | ||
// If the CRD isn’t installed, it will return a NoMatchError. | ||
err := r.Client.Get(context.Background(), client.ObjectKey{Name: machineConfigName}, mc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only sensible user for machineConfigName
. No deep to define a constant for that. Let's use the literal string "dummy-machine-config"
directly here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Prepare feature gate for multi-value support by replacing map[string]bool with struct Signed-off-by: Patrik Fodor <[email protected]>
Introduce a new feature gate to support running without the Machine Config Operator (MCO), using a DaemonSet-based deployment instead. This commit adds only the feature gate definition and detection logic. The feature gate supports three options, which determine whether DaemonSet mode is activated. Signed-off-by: Patrik Fodor <[email protected]>
Introduced a helper function to fetch component images based on the cluster version. Also added a function to label nodes. Signed-off-by: Patrik Fodor <[email protected]>
This commit builds on the previously introduced feature gate for DaemonSet-based deployment. Adds support for deploying and managing the Kata and CAA installation using DaemonSets without relying on the Machine Config Operator (MCO). Signed-off-by: Patrik Fodor <[email protected]>
cb2cb22
to
c8dac15
Compare
@Pacho20: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Introduce support for DaemonSet-based deployment mode for clusters running without the Machine Config Operator (MCO).
This feature enables an alternative deployment path by detecting the absence of the MachineConfig CRD and activating DaemonSet mode via a dedicated feature gate. It includes logic for deploying and managing DaemonSets on targeted nodes, using labels to control execution and track installation status.
There are still many TODOs in the implementation, and the list will be updated as development progresses.