Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sidecar] Failure Events are still present after status of Bucket Ready: true during Bucket creation #103

Open
asundriya opened this issue Oct 28, 2024 · 1 comment

Comments

@asundriya
Copy link

What happened:
Events raised when we have Bucket creation failure should be cleared when Bucket creation is successful
What you expected to happen:
Events generated should get cleared when bucket creation issue is rectified

How to reproduce this bug (as minimally and precisely as possible):

Induce an error while bucket creation.
Bucket creation will fail with status Bucket Ready: False and event is generated

kubectl describe bucket bc1a2a39683-f400-4e1d-9a7a-05e3ec85abc0
Name: bc1a2a39683-f400-4e1d-9a7a-05e3ec85abc0
Namespace:
Labels:
Annotations:
API Version: objectstorage.k8s.io/v1alpha1
Kind: Bucket
Metadata:
Creation Timestamp: 2024-10-25T08:48:31Z
Generation: 1
Resource Version: 509042
UID: 6ae2c16d-2b67-45b5-bcd9-bd9585d5f63b
Spec:
Bucket Claim:
Name: arvclaim
Namespace: default
UID: a2a39683-f400-4e1d-9a7a-05e3ec85abc0
Bucket Class Name: bc1
Deletion Policy: Delete
Driver Name: cosi.XXXX.com
Parameters:
Bucket Tags: key1=value1, key2=, key3=value3,
Cosi User Secret Name: cosi-user-secret-hfjyjf112o
Cosi User Secret Namespace: default
Protocols:
s3l
Events:
Type Reason Age From Message

Warning FailedCreateBucket 1s (x13 over 2m26s) cosi failed to create bucket: rpc error: code = Internal desc = failed to create bucket due to an internal error

Resolve the issue and we see that over Status Bucket Ready: True

kubectl describe bucket bc1a2a39683-f400-4e1d-9a7a-05e3ec85abc0
Name: bc1a2a39683-f400-4e1d-9a7a-05e3ec85abc0
Namespace:
Labels:
Annotations:
API Version: objectstorage.k8s.io/v1alpha1
Kind: Bucket
Metadata:
Creation Timestamp: 2024-10-25T08:48:31Z
Finalizers:
cosi.objectstorage.k8s.io/bucket-protection
Generation: 1
Resource Version: 510601
UID: 6ae2c16d-2b67-45b5-bcd9-bd9585d5f63b
Spec:
Bucket Claim:
Name: arvclaim
Namespace: default
UID: a2a39683-f400-4e1d-9a7a-05e3ec85abc0
Bucket Class Name: bc1
Deletion Policy: Delete
Driver Name: cosi.XXXX.com
Parameters:
Bucket Tags: key1=value1, key2=, key3=value3,
Cosi User Secret Name: cosi-user-secret-hfjyjf112o
Cosi User Secret Namespace: default
Protocols:
s3
Status:
Bucket ID: bc1a2a39683-f400-4e1d-9a7a-05e3ec85abc0
Bucket Ready: true
Events:
Type Reason Age From Message

Warning FailedCreateBucket 108s (x38 over 16m) cosi failed to create bucket: rpc error: code = Internal desc = failed to create bucket due to an internal error

Issue is ,
a. We still see the Events for one hour which is misleading.
b. If we are showing Failed event , then we should also show Successful event , so that user is assured that his workflow passed
Same issue is being seen with Bucket access also

Issue is
The event handling for the COSI APIs is handled by the sidecar (https://github.com/kubernetes-sigs/container-object-storage-interface-provisioner-sidecar) where if an error is returned to it from the driver, it will create an event in the related custom resource. However, the sidecar does not currently delete the event if the reconciliation is successful in a subsequent retry of the same operation.

Environment:
• Kubernetes version (use kubectl version), please list client and server:
Client Version: v1.30.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.0
• Controller version (provide the release tag or commit hash):
gcr.io/k8s-staging-sig-storage/objectstorage-controller:v20221027-v0.1.1-8-g300019f
• Provisoner name and version (provide the release tag or commit hash):
gcr.io/k8s-staging-sig-storage/objectstorage-sidecar:latest
• Cloud provider or hardware configuration:
• OS (e.g: cat /etc/os-release):
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
• Kernel (e.g. uname -a):
Linux tnh-cosi-3 5.15.0-46-generic move to sigs.k8s.io, remove retry logic in cosi-controller #49-Ubuntu SMP Thu Aug 4 18:03:25 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
• Install tools:
• Network plugin and version (if this is a network-related bug):
• Others:

@gauriKrishnan
Copy link

Hi @BlaineEXE

This is a dup of kubernetes-sigs/container-object-storage-interface-provisioner-sidecar#156

Adding to the observations of @asundriya, the COSI sidecar records an event after receiving an error from the driver's DriverBucketCreate and DriverGrantBucketAccess functions. The event persists even if the failure has been resolved in a subsequent retry - indicated by the Status showing Bucket Ready: True.

https://github.com/kubernetes-sigs/container-object-storage-interface-provisioner-sidecar/blob/80979e8992a6a2b2166f3ff1e7d39b4ab03f045c/pkg/bucket/bucket_controller.go#L131

https://github.com/kubernetes-sigs/container-object-storage-interface-provisioner-sidecar/blob/80979e8992a6a2b2166f3ff1e7d39b4ab03f045c/pkg/bucketaccess/bucketaccess_controller.go#L177

Displaying a warning event with an error at the same time as the Status shows True, makes the Bucket or BucketAccess description look ambiguous during the 1 hour that this Kubernetes event persists.

I suggest either deleting the warning event or creating a normal event when the operation has succeeded. I have highlighted the location in code where these changes could be made (after the Status has been updated to True):

https://github.com/kubernetes-sigs/container-object-storage-interface-provisioner-sidecar/blob/80979e8992a6a2b2166f3ff1e7d39b4ab03f045c/pkg/bucket/bucket_controller.go#L169

https://github.com/kubernetes-sigs/container-object-storage-interface-provisioner-sidecar/blob/80979e8992a6a2b2166f3ff1e7d39b4ab03f045c/pkg/bucketaccess/bucketaccess_controller.go#L300

Please let us know if you have any questions. Thanks!
CC: @narayviv @asundriya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To do for v1alpha2
Development

No branches or pull requests

2 participants