Use Scylla Manager cluster labels for cluster reconciliation #2156

rzetelskik · 2024-10-15T08:38:19Z

Description of your changes: Currently, when the manager controller fails to save manager's cluster ID in ScyllaCluster's status after cluster's registration with manager, the cluster is deleted and recreated again. As update conflicts are not a rare occurrence, this often causes many unnecessary recreation attempts.
To make the reconciliation more robust, this PR changes this behaviour. Instead of using the ID from status, labels from manager state are used. A cluster is created with a label holding the owner's UID, which allows us to maintain and recognize cluster's identity without relying on the status of our API resources. In turn clusters are only deleted when the owner UID labels is not matching the UID of the current owner, in order to avoid name collisions.

The labels are also extended with a managed hash label to align the cluster update logic with changes recently introduced in #2142.

The logic related to creating "actions" is modified to produce one cluster-related action at once and requeue in order to only schedule any further actions on next iteration. The reasoning behind it is to try avoiding errors related to task actions in case of a required cluster action, e.g. when auth token needs to be updated first.

Additionally, the manager state computed in each reconciliation loop is reduced to only one cluster, since cluster names in manager are unique and propagating additional clusters to the state is redundant.

Unit tests are also extended to cover these scenarios and unified for consistency.

Which issue is resolved by this Pull Request:
Resolves #1902

/kind bug
/priority important-soon
/cc

scylla-operator-bot · 2024-10-15T08:38:22Z

@rzetelskik: GitHub didn't allow me to request PR reviews from the following users: rzetelskik.

Note that only scylladb members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Description of your changes: wip

Which issue is resolved by this Pull Request:
Resolves #1902

/kind bug
/priority important-soon
/cc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

rzetelskik · 2024-10-15T12:20:53Z

/cc zimnx tnozicka

pkg/controller/manager/sync.go

pkg/controller/manager/sync_test.go

tnozicka

/approve

/assign @zimnx

pkg/controller/manager/status.go

rzetelskik · 2024-10-17T12:18:49Z

@rzetelskik: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gke-parallel-clusterip cf03afe link true /test e2e-gke-parallel-clusterip
Full PR test history. Your PR dashboard.

cluster provisioning failed
/retest

rzetelskik · 2024-10-21T07:07:54Z

@rzetelskik: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gke-parallel-clusterip cf03afe link true /test e2e-gke-parallel-clusterip
Full PR test history. Your PR dashboard.

tls test flake, can't possibly be related?
#2096 (comment)
/retest

zimnx

/lgtm

scylla-operator-bot · 2024-10-21T07:09:11Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rzetelskik, tnozicka, zimnx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [tnozicka,zimnx]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rzetelskik · 2024-10-21T07:39:42Z

@rzetelskik: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gke-parallel-clusterip cf03afe link true /test e2e-gke-parallel-clusterip
Full PR test history. Your PR dashboard.

/test images
/retest

rzetelskik force-pushed the manager-cluster-deletion-fix branch 3 times, most recently from f28d149 to 5c455fd Compare October 15, 2024 10:08

rzetelskik changed the title ~~[WIP] Use clusters' OwnerUID labels to reconcile clusters registered with Scylla Manager~~ Use Scylla Manager cluster labels for cluster reconciliation Oct 15, 2024

scylla-operator-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 15, 2024

rzetelskik changed the title ~~Use Scylla Manager cluster labels for cluster reconciliation~~ [WIP] Use Scylla Manager cluster labels for cluster reconciliation Oct 15, 2024

scylla-operator-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 15, 2024

rzetelskik changed the title ~~[WIP] Use Scylla Manager cluster labels for cluster reconciliation~~ Use Scylla Manager cluster labels for cluster reconciliation Oct 15, 2024

scylla-operator-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 15, 2024

scylla-operator-bot bot requested review from tnozicka and zimnx October 15, 2024 12:20

zimnx reviewed Oct 15, 2024

View reviewed changes

pkg/controller/manager/sync.go Outdated Show resolved Hide resolved

pkg/controller/manager/sync_test.go Show resolved Hide resolved

rzetelskik force-pushed the manager-cluster-deletion-fix branch from 5c455fd to 5ffcffb Compare October 16, 2024 16:51

rzetelskik requested a review from zimnx October 16, 2024 16:52

tnozicka approved these changes Oct 17, 2024

View reviewed changes

pkg/controller/manager/status.go Outdated Show resolved Hide resolved

scylla-operator-bot bot assigned zimnx Oct 17, 2024

scylla-operator-bot bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 17, 2024

rzetelskik force-pushed the manager-cluster-deletion-fix branch from 5ffcffb to 048786e Compare October 17, 2024 10:26

rzetelskik requested a review from tnozicka October 17, 2024 10:28

Use Scylla Manager cluster labels for cluster reconciliation

cf03afe

rzetelskik force-pushed the manager-cluster-deletion-fix branch from 048786e to cf03afe Compare October 17, 2024 11:19

zimnx approved these changes Oct 21, 2024

View reviewed changes

scylla-operator-bot bot added the lgtm Indicates that a PR is ready to be merged. label Oct 21, 2024

scylla-operator-bot bot merged commit e2c562b into scylladb:master Oct 21, 2024
12 checks passed

rzetelskik mentioned this pull request Oct 29, 2024

Flake - Scylla Manager integration should register cluster, sync backup tasks and support manual restore procedure [It] using default ScyllaDB version [RequiresObjectStorage] #2161

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Scylla Manager cluster labels for cluster reconciliation #2156

Use Scylla Manager cluster labels for cluster reconciliation #2156

rzetelskik commented Oct 15, 2024 •

edited

Loading

scylla-operator-bot bot commented Oct 15, 2024

rzetelskik commented Oct 15, 2024

tnozicka left a comment

rzetelskik commented Oct 17, 2024

rzetelskik commented Oct 21, 2024

zimnx left a comment

scylla-operator-bot bot commented Oct 21, 2024

rzetelskik commented Oct 21, 2024

Use Scylla Manager cluster labels for cluster reconciliation #2156

Use Scylla Manager cluster labels for cluster reconciliation #2156

Conversation

rzetelskik commented Oct 15, 2024 • edited Loading

scylla-operator-bot bot commented Oct 15, 2024

rzetelskik commented Oct 15, 2024

tnozicka left a comment

Choose a reason for hiding this comment

rzetelskik commented Oct 17, 2024

rzetelskik commented Oct 21, 2024

zimnx left a comment

Choose a reason for hiding this comment

scylla-operator-bot bot commented Oct 21, 2024

rzetelskik commented Oct 21, 2024

rzetelskik commented Oct 15, 2024 •

edited

Loading