Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Bug) recover when token expires #273

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gianlucam76
Copy link
Member

When the drift-detection-manager runs in the management cluster, it receives the managed cluster's kubeconfig from a Secret. These kubeconfigs can expire (e.g., GKE tokens have a maximum lifespan of 48 hours).

Sveltos includes a mechanism to proactively renew these tokens. The SveltosCluster controller can be configured to periodically refresh tokens before they expire, preventing disruptions.

However, prior to this change, the drift-detection controller, when deployed in the management cluster, lacked the ability to retrieve an updated kubeconfig. Consequently, upon kubeconfig expiration, the controller encountered numerous authorization errors, effectively ceasing operation.

This pull request addresses this issue by implementing a mechanism to detect kubeconfig expiration. Upon detection, the drift-detection controller retrieves a fresh, valid kubeconfig. This triggers a restart of the controller-runtime manager (and all associated controllers) as well as the evaluation manager, ensuring continued operation.

Fixes #272

When the drift-detection-manager runs in the management cluster,
it receives the managed cluster's kubeconfig from a Secret.
These kubeconfigs can expire (e.g., GKE tokens have a maximum
lifespan of 48 hours).

Sveltos includes a mechanism to proactively renew these tokens.
The SveltosCluster controller can be configured to periodically
refresh tokens before they expire, preventing disruptions.

However, prior to this change, the drift-detection controller,
when deployed in the management cluster, lacked the ability to
retrieve an updated kubeconfig. Consequently, upon kubeconfig
expiration, the controller encountered numerous authorization
errors, effectively ceasing operation.

This pull request addresses this issue by implementing a mechanism
to detect kubeconfig expiration. Upon detection, the drift-detection
controller retrieves a fresh, valid kubeconfig. This triggers a restart
of the controller-runtime manager (and all associated controllers) as well
as the evaluation manager, ensuring continued operation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(bug) when agent is running within the management cluster, after a while agent stops working
1 participant