Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Fix HFC to execute updates #1793

Merged
merged 1 commit into from
Jul 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion apis/metal3.io/v1alpha1/hostfirmwarecomponents_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ type HostFirmwareComponentsStatus struct {
// Updates is the list of all firmware components that should be updated
// they are specified via name and url fields.
// +optional
Updates []FirmwareUpdate `json:"updates"`
Updates []FirmwareUpdate `json:"updates,omitempty"`

// Components is the list of all available firmware components and their information.
Components []FirmwareComponentStatus `json:"components,omitempty"`
Expand Down
70 changes: 57 additions & 13 deletions controllers/metal3.io/baremetalhost_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -1164,6 +1164,20 @@ func (r *BareMetalHostReconciler) actionPreparing(prov provisioner.Provisioner,
return recordActionFailure(info, metal3api.PreparationError, provResult.ErrorMessage)
}

if hfcDirty && started {
hfcStillDirty, err := r.saveHostFirmwareComponents(prov, info, hfc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This saves the new status when we begin the manual cleaning, but I think I'm right in saying clearHostProvisioningSettings() does not clear them? So if there's a failure in actually applying this change, we won't retry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there is a failure we shouldn't retry I would say, it can be a bad firmware (Dell for example has firmware separate for each model, if I use the firmware of an R750 in R640 it complains and fails)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or it could be a dropped connection or a failed write.
We shouldn't report that we've done something if we haven't done it.
If the user provides the wrong firmware we should keep trying until they realise and stop doing that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, unless we invent a way for Ironic to say "this will never work, don't try again", the current trend is to retry.

Copy link
Member Author

@iurygregory iurygregory Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

	if provResult.ErrorMessage != "" {
		if bmhDirty {
			info.log.Info("handling cleaning error in controller")
			clearHostProvisioningSettings(info.host)
		}
		if hfcDitry {
			clearHostFirmwareComponentsUpdates(hfc)
		}
		return recordActionFailure(info, metal3api.PreparationError, provResult.ErrorMessage)
	}

Something like this? a new function clearHostFirmwareComponentsUpdates to be used, since I don't think we should change clearHostProvisioningSettings

if err != nil {
return actionError{errors.Wrap(err, "could not save the host firmware components")}
}

if hfcStillDirty {
info.log.Info("going to update the host firmware components")
if err := r.Status().Update(info.ctx, hfc); err != nil {
return actionError{errors.Wrap(err, "failed to update hostfirmwarecomponents status")}
}
}
}

if bmhDirty && started {
info.log.Info("saving host provisioning settings")
_, err := saveHostProvisioningSettings(info.host, info)
Expand Down Expand Up @@ -1740,6 +1754,32 @@ func saveHostProvisioningSettings(host *metal3api.BareMetalHost, info *reconcile
return dirty, nil
}

func (r *BareMetalHostReconciler) saveHostFirmwareComponents(prov provisioner.Provisioner, info *reconcileInfo, hfc *metal3api.HostFirmwareComponents) (dirty bool, err error) {
dirty = false
if reflect.DeepEqual(hfc.Status.Updates, hfc.Spec.Updates) {
info.log.Info("not saving hostFirmwareComponents information since is not necessary")
return dirty, nil
}

info.log.Info("saving hostFirmwareComponents information", "spec updates", hfc.Spec.Updates, "status updates", hfc.Status.Updates)

hfc.Status.Updates = make([]metal3api.FirmwareUpdate, len(hfc.Spec.Updates))
for i := range hfc.Spec.Updates {
hfc.Spec.Updates[i].DeepCopyInto(&hfc.Status.Updates[i])
}
iurygregory marked this conversation as resolved.
Show resolved Hide resolved

// Retrieve new information about the firmware components stored in ironic
components, err := prov.GetFirmwareComponents()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is only going to get called once, at the beginning of manual cleaning.
Aren't the Components expected to change in ironic only after manual cleaning is complete?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is the problem I'm planning to fix in a separate PR, still trying to figure out how

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zane is right. But I'm also wondering why the HFC controller does not update the components afterwards.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point. I thought at one point we deleted that from the HFC controller, but indeed it's still there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I've waited some hours to see if the HFC controller would attemp to update, but it didn't..,my best guess is that I need to change the conditions when calling updateHostFirmware in the Reconcile for HostFirmwareComponentsReconciler

if err != nil {
info.log.Error(err, "failed to get new information for firmware components in ironic")
return dirty, err
}
hfc.Status.Components = components
dirty = true

return dirty, nil
}

func (r *BareMetalHostReconciler) createHostFirmwareComponents(info *reconcileInfo) error {
// Check if HostFirmwareComponents already exists
hfc := &metal3api.HostFirmwareComponents{}
Expand All @@ -1755,18 +1795,31 @@ func (r *BareMetalHostReconciler) createHostFirmwareComponents(info *reconcileIn

// Set bmh as owner, this makes sure the resource is deleted when bmh is deleted
if err = controllerutil.SetControllerReference(info.host, hfc, r.Scheme()); err != nil {
iurygregory marked this conversation as resolved.
Show resolved Hide resolved
return errors.Wrap(err, "could not set bmh as controller")
return errors.Wrap(err, "could not set bmh as controller for hostFirmwareComponents")
}
if err = r.Create(info.ctx, hfc); err != nil {
return errors.Wrap(err, "failure creating hostFirmwareComponents resource")
}

info.log.Info("created new hostFirmwareComponents resource")
} else {
// Error reading the object
return errors.Wrap(err, "could not load hostFirmwareComponents resource")
return nil
}
// Error reading the object
return errors.Wrap(err, "could not load hostFirmwareComponents resource")
}
// Necessary in case the CRD is created manually.

if !ownerReferenceExists(info.host, hfc) {
if err := controllerutil.SetControllerReference(info.host, hfc, r.Scheme()); err != nil {
iurygregory marked this conversation as resolved.
Show resolved Hide resolved
return errors.Wrap(err, "could not set bmh as controller for hostFirmwareComponents")
}
if err := r.Update(info.ctx, hfc); err != nil {
iurygregory marked this conversation as resolved.
Show resolved Hide resolved
return errors.Wrap(err, "failure updating hostFirmwareComponents resource")
}

return nil
}

return nil
}

Expand Down Expand Up @@ -1851,15 +1904,6 @@ func (r *BareMetalHostReconciler) getHostFirmwareComponents(info *reconcileInfo)

// Check if there are Updates in the Spec that are different than the Status
if meta.IsStatusConditionTrue(hfc.Status.Conditions, string(metal3api.HostFirmwareComponentsChangeDetected)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory we should check that the condition matches the current Generation (example) so we know the data is not out of date.
Not a huge deal though. (Bigger worry is actually that we miss that the Spec has been copied to the Status - which won't bump the generation - which would result in us doing the update again.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do some tests in the follow-up I'm working on.

// Check if the status have been populated
if len(hfc.Status.Updates) == 0 {
return false, nil, errors.New("host firmware status updates not available")
}

if len(hfc.Status.Components) == 0 {
return false, nil, errors.New("host firmware status components not available")
}

if meta.IsStatusConditionTrue(hfc.Status.Conditions, string(metal3api.HostFirmwareComponentsValid)) {
info.log.Info("hostFirmwareComponents indicating ChangeDetected", "namespacename", info.request.NamespacedName)
return true, hfc, nil
Expand Down
8 changes: 8 additions & 0 deletions controllers/metal3.io/host_state_machine.go
Original file line number Diff line number Diff line change
Expand Up @@ -465,6 +465,14 @@ func (hsm *hostStateMachine) handleAvailable(info *reconcileInfo) actionResult {
return actionComplete{}
}

// Check if hostFirmwareComponents have changed
if dirty, _, err := hsm.Reconciler.getHostFirmwareComponents(info); err != nil {
return actionError{err}
} else if dirty {
hsm.NextState = metal3api.StatePreparing
return actionComplete{}
}

// ErrorCount is cleared when appropriate inside actionManageAvailable
actResult := hsm.Reconciler.actionManageAvailable(hsm.Provisioner, info)
if _, complete := actResult.(actionComplete); complete {
Expand Down
35 changes: 11 additions & 24 deletions controllers/metal3.io/hostfirmwarecomponents_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,6 @@ func (r *HostFirmwareComponentsReconciler) Reconcile(ctx context.Context, req ct

// Fetch the HostFirmwareComponents
hfc := &metal3api.HostFirmwareComponents{}
info := &rhfcInfo{ctx: ctx, log: reqLogger, hfc: hfc, bmh: bmh}
if err = r.Get(ctx, req.NamespacedName, hfc); err != nil {
// The HFC resource may have been deleted
if k8serrors.IsNotFound(err) {
Expand All @@ -134,6 +133,7 @@ func (r *HostFirmwareComponentsReconciler) Reconcile(ctx context.Context, req ct
}

// Create a provisioner to access Ironic API
info := &rhfcInfo{ctx: ctx, log: reqLogger, hfc: hfc, bmh: bmh}
prov, err := r.ProvisionerFactory.NewProvisioner(ctx, provisioner.BuildHostDataNoBMC(*bmh), info.publishEvent)
if err != nil {
return ctrl.Result{}, fmt.Errorf("failed to create provisioner: %w", err)
Expand Down Expand Up @@ -181,11 +181,6 @@ func (r *HostFirmwareComponentsReconciler) Reconcile(ctx context.Context, req ct
// Update the HostFirmwareComponents resource using the components from provisioner.
func (r *HostFirmwareComponentsReconciler) updateHostFirmware(info *rhfcInfo, components []metal3api.FirmwareComponentStatus) (err error) {
dirty := false
var newStatus metal3api.HostFirmwareComponentsStatus
// change the Updates in Status
newStatus.Updates = info.hfc.Spec.Updates
// change the Components in Status
newStatus.Components = components

// Check if the updates in the Spec are different than Status
updatesMismatch := !reflect.DeepEqual(info.hfc.Status.Updates, info.hfc.Spec.Updates)
Expand All @@ -196,26 +191,29 @@ func (r *HostFirmwareComponentsReconciler) updateHostFirmware(info *rhfcInfo, co
reason := reasonValidComponent
generation := info.hfc.GetGeneration()

newStatus := info.hfc.Status.DeepCopy()
newStatus.Components = components

if updatesMismatch {
if setUpdatesCondition(generation, &newStatus, info, metal3api.HostFirmwareComponentsChangeDetected, metav1.ConditionTrue, reason, "") {
if setUpdatesCondition(generation, newStatus, info, metal3api.HostFirmwareComponentsChangeDetected, metav1.ConditionTrue, reason, "") {
dirty = true
}

err := r.validateHostFirmwareComponents(info)
if err != nil {
info.publishEvent("ValidationFailed", fmt.Sprintf("Invalid Firmware Components: %s", err))
reason = reasonInvalidComponent
if setUpdatesCondition(generation, &newStatus, info, metal3api.HostFirmwareComponentsValid, metav1.ConditionFalse, reason, fmt.Sprintf("Invalid Firmware Components: %s", err)) {
if setUpdatesCondition(generation, newStatus, info, metal3api.HostFirmwareComponentsValid, metav1.ConditionFalse, reason, fmt.Sprintf("Invalid Firmware Components: %s", err)) {
dirty = true
}
} else if setUpdatesCondition(generation, &newStatus, info, metal3api.HostFirmwareComponentsValid, metav1.ConditionTrue, reason, "") {
} else if setUpdatesCondition(generation, newStatus, info, metal3api.HostFirmwareComponentsValid, metav1.ConditionTrue, reason, "") {
dirty = true
}
} else {
if setUpdatesCondition(generation, &newStatus, info, metal3api.HostFirmwareComponentsValid, metav1.ConditionTrue, reason, "") {
if setUpdatesCondition(generation, newStatus, info, metal3api.HostFirmwareComponentsValid, metav1.ConditionTrue, reason, "") {
dirty = true
}
if setUpdatesCondition(generation, &newStatus, info, metal3api.HostFirmwareComponentsChangeDetected, metav1.ConditionFalse, reason, "") {
if setUpdatesCondition(generation, newStatus, info, metal3api.HostFirmwareComponentsChangeDetected, metav1.ConditionFalse, reason, "") {
dirty = true
}
}
Expand Down Expand Up @@ -258,18 +256,6 @@ func (r *HostFirmwareComponentsReconciler) validateHostFirmwareComponents(info *
if _, ok := allowedNames[componentName]; !ok {
errors = append(errors, fmt.Errorf("component %s is invalid, only 'bmc' or 'bios' are allowed as update names", componentName))
}
if len(errors) == 0 {
componentInStatus := false
for _, componentStatus := range info.hfc.Status.Components {
if componentName == componentStatus.Component {
componentInStatus = true
break
}
}
if !componentInStatus {
errors = append(errors, fmt.Errorf("component %s is invalid because is not present in status", componentName))
}
}
}

return errors
Expand All @@ -295,8 +281,9 @@ func setUpdatesCondition(generation int64, status *metal3api.HostFirmwareCompone
Reason: string(reason),
Message: message,
}
currCond := meta.FindStatusCondition(info.hfc.Status.Conditions, string(cond))

meta.SetStatusCondition(&status.Conditions, newCondition)
currCond := meta.FindStatusCondition(info.hfc.Status.Conditions, string(cond))

if currCond == nil || currCond.Status != newStatus {
return true
Expand Down
65 changes: 19 additions & 46 deletions controllers/metal3.io/hostfirmwarecomponents_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ import (
"testing"

metal3api "github.com/metal3-io/baremetal-operator/apis/metal3.io/v1alpha1"
"github.com/metal3-io/baremetal-operator/pkg/hardwareutils/bmc"
"github.com/metal3-io/baremetal-operator/pkg/provisioner"
"github.com/metal3-io/baremetal-operator/pkg/provisioner/fixture"
"github.com/stretchr/testify/assert"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
ctrl "sigs.k8s.io/controller-runtime"
Expand All @@ -25,20 +28,15 @@ func getTestHFCReconciler(host *metal3api.HostFirmwareComponents) *HostFirmwareC
return reconciler
}

func getMockHFCProvisioner(components []metal3api.FirmwareComponentStatus) *hfcMockProvisioner {
return &hfcMockProvisioner{
Components: components,
Error: nil,
func getMockHFCProvisioner(host *metal3api.BareMetalHost, components []metal3api.FirmwareComponentStatus) provisioner.Provisioner {
state := fixture.Fixture{
HostFirmwareComponents: fixture.HostFirmwareComponentsMock{
Components: components,
},
}
}

type hfcMockProvisioner struct {
Components []metal3api.FirmwareComponentStatus
Error error
}

func (m *hfcMockProvisioner) GetFirmwareComponents() (components []metal3api.FirmwareComponentStatus, err error) {
return m.Components, m.Error
p, _ := state.NewProvisioner(context.TODO(), provisioner.BuildHostData(*host, bmc.Credentials{}),
func(reason, message string) {})
return p
}

// Mock components to return from provisioner.
Expand Down Expand Up @@ -96,18 +94,15 @@ func getCurrentComponents(updatedComponents string) []metal3api.FirmwareComponen
// Create the baremetalhost reconciler and use that to create bmh in same namespace.
func createBaremetalHostHFC() *metal3api.BareMetalHost {
bmh := &metal3api.BareMetalHost{}
bmh.ObjectMeta = metav1.ObjectMeta{Name: "hostName", Namespace: "hostNamespace"}
bmh.ObjectMeta = metav1.ObjectMeta{Name: hostName, Namespace: hostNamespace}
c := fakeclient.NewFakeClient(bmh)

reconciler := &BareMetalHostReconciler{
Client: c,
ProvisionerFactory: nil,
Log: ctrl.Log.WithName("bmh_reconciler").WithName("BareMetalHost"),
}
err := reconciler.Create(context.TODO(), bmh)
if err != nil {
return nil
}
_ = reconciler.Create(context.TODO(), bmh)

return bmh
}
Expand Down Expand Up @@ -170,12 +165,6 @@ func TestStoreHostFirmwareComponents(t *testing.T) {
},
},
Status: metal3api.HostFirmwareComponentsStatus{
Updates: []metal3api.FirmwareUpdate{
{
Component: "bmc",
URL: "https://myurls/newbmcfirmware",
},
},
Components: []metal3api.FirmwareComponentStatus{
{
Component: "bmc",
Expand Down Expand Up @@ -247,12 +236,6 @@ func TestStoreHostFirmwareComponents(t *testing.T) {
},
},
Status: metal3api.HostFirmwareComponentsStatus{
Updates: []metal3api.FirmwareUpdate{
{
Component: "bios",
URL: "https://myurls/newbiosfirmware",
},
},
Components: []metal3api.FirmwareComponentStatus{
{
Component: "bmc",
Expand Down Expand Up @@ -328,16 +311,6 @@ func TestStoreHostFirmwareComponents(t *testing.T) {
},
},
Status: metal3api.HostFirmwareComponentsStatus{
Updates: []metal3api.FirmwareUpdate{
{
Component: "bmc",
URL: "https://myurl/newbmcfirmware",
},
{
Component: "bios",
URL: "https://myurls/newbiosfirmware",
},
},
Components: []metal3api.FirmwareComponentStatus{
{
Component: "bmc",
Expand Down Expand Up @@ -404,30 +377,31 @@ func TestStoreHostFirmwareComponents(t *testing.T) {
for _, tc := range testCases {
t.Run(tc.Scenario, func(t *testing.T) {
ctx := context.TODO()
prov := getMockHFCProvisioner(getCurrentComponents(tc.UpdatedComponents))

tc.ExpectedComponents.TypeMeta = metav1.TypeMeta{
Kind: "HostFirmwareComponents",
APIVersion: "metal3.io/v1alpha1"}
tc.ExpectedComponents.ObjectMeta = metav1.ObjectMeta{
Name: "hostName",
Namespace: "hostNamespace",
Name: hostName,
Namespace: hostNamespace,
ResourceVersion: "2"}

hfc := tc.CurrentHFCResource
r := getTestHFCReconciler(hfc)
// Create a bmh resource needed by hfc reconciler
bmh := createBaremetalHostHFC()

prov := getMockHFCProvisioner(bmh, getCurrentComponents(tc.UpdatedComponents))

info := &rhfcInfo{
ctx: ctx,
log: logf.Log.WithName("controllers").WithName("HostFirmwareComponents"),
hfc: tc.CurrentHFCResource,
bmh: bmh,
}

components, err := prov.GetFirmwareComponents()
assert.NoError(t, err)

err = r.updateHostFirmware(info, components)
assert.NoError(t, err)

Expand All @@ -438,10 +412,9 @@ func TestStoreHostFirmwareComponents(t *testing.T) {
err = r.Client.Get(ctx, key, actual)
assert.Equal(t, nil, err)

// Ensure ExpectedComponents matches actual
assert.Equal(t, tc.ExpectedComponents.Spec.Updates, actual.Spec.Updates)
assert.Equal(t, tc.ExpectedComponents.Status.Components, actual.Status.Components)
assert.Equal(t, tc.ExpectedComponents.Status.Updates, actual.Status.Updates)

currentTime := metav1.Now()
tc.ExpectedComponents.Status.LastUpdated = &currentTime
actual.Status.LastUpdated = &currentTime
Expand Down
1 change: 1 addition & 0 deletions main.go
Original file line number Diff line number Diff line change
Expand Up @@ -337,6 +337,7 @@ func main() {
ProvisionerFactory: provisionerFactory,
}).SetupWithManager(mgr, maxConcurrency); err != nil {
setupLog.Error(err, "unable to create controller", "controller", "HostFirmwareComponents")
os.Exit(1)
}

if err = (&metal3iocontroller.DataImageReconciler{
Expand Down
Loading