Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPE-4200 Scale-up from zero #414

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

paulomach
Copy link
Contributor

Issue

O scaling to zero, the last unit ensure the cluster is dissolved.
Scale back to unit > 0 don't detect and treat the dissolved cluster.

Solution

  • detect and treat special case of dissolved cluster

Fixes #409

@paulomach paulomach marked this pull request as ready for review June 14, 2024 20:03
Copy link
Contributor

@carlcsaposs-canonical carlcsaposs-canonical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might need to change some things on router to support scaling mysql-k8s to 0 and back up

Comment on lines +261 to +267
_default_unit_data_keys = {
"egress-subnets",
"ingress-address",
"private-address",
"unit-status",
}
return self.unit_peer_data.keys() == _default_unit_data_keys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this test might be flaky if the default keys are ever changed by Juju
consider writing something to persistent disk or databag & checking that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we utilize units-added-to-cluster here instead of relying on these keys?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, we can rely only on data on the storage (it can be foreign storage by user misatke, which we should not damage/erase).

Comment on lines +651 to +653
if self.unit.is_leader():
# create the cluster due it being dissolved on scale-down
self.create_cluster()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there's a cluster that already exists on persistent disk, will that be re-used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, charm must be blocked if foreign disk attached by mistake (to avoid data damage).

Today it is not possible for K8s, but Pedro is working with Juju on it:

> juju scale-application mysql 0
mysql scaled to 0 units

> juju add-storage mysql/0 database=foreigndisk
ERROR Juju command "add-storage" not supported on container models    # will be supported on K8s like on VM

> juju scale-application mysql 3
...

Another reason to write to disk:

> juju scale-application mysql 0

# juju controller crashed... restored ...

> juju scale-application mysql 1  # did we loose units-added-to-cluster ?

if self.unit.is_leader():
# Update 'units-added-to-cluster' counter in the peer relation databag
units = int(self.app_peer_data.get("units-added-to-cluster", 1))
self.app_peer_data["units-added-to-cluster"] = str(units - 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also have this update to units-added-to-cluster on the peer-relation-departed event? wouldnt we like to decrement this value if non-leader units are scaled down?

or is this happening on update-status (in particular _set_app_status)? if this is the case, would there be inconsistency issues -> there are 3 units, if another unit departed and then the leader unit departed, units-added-to-cluster will be 2 instead of 1 (if storage-detaching runs on leader before update-status)

Comment on lines +261 to +267
_default_unit_data_keys = {
"egress-subnets",
"ingress-address",
"private-address",
"unit-status",
}
return self.unit_peer_data.keys() == _default_unit_data_keys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we utilize units-added-to-cluster here instead of relying on these keys?

Copy link
Contributor

@taurus-forever taurus-forever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but IMHO, better to discuss corner cases before merging.

Comment on lines +261 to +267
_default_unit_data_keys = {
"egress-subnets",
"ingress-address",
"private-address",
"unit-status",
}
return self.unit_peer_data.keys() == _default_unit_data_keys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, we can rely only on data on the storage (it can be foreign storage by user misatke, which we should not damage/erase).

Comment on lines +651 to +653
if self.unit.is_leader():
# create the cluster due it being dissolved on scale-down
self.create_cluster()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, charm must be blocked if foreign disk attached by mistake (to avoid data damage).

Today it is not possible for K8s, but Pedro is working with Juju on it:

> juju scale-application mysql 0
mysql scaled to 0 units

> juju add-storage mysql/0 database=foreigndisk
ERROR Juju command "add-storage" not supported on container models    # will be supported on K8s like on VM

> juju scale-application mysql 3
...

Another reason to write to disk:

> juju scale-application mysql 0

# juju controller crashed... restored ...

> juju scale-application mysql 1  # did we loose units-added-to-cluster ?

@taurus-forever
Copy link
Contributor

BTW, we are adding the test to check scale-down to zero and restore with foreign disk here.

Maybe it worth to copy it into MySQL VM/K8s as well?

@paulomach
Copy link
Contributor Author

we might need to change some things on router to support scaling mysql-k8s to 0 and back up

Testing it, it requires a remove and re-relate. On scaling in to zero, routers are cleaned up from the metadata already.
Messaging in the router is very clear, imo I think that's adequate behavior.

@taurus-forever
Copy link
Contributor

re: it requires a remove and re-relate

Fields/customers do not like this... can we avoid re-relations without the real over-complication in code?

@sombrafam
Copy link

@paulomach hi Paulo, I see that the pool request was merged. How can I know the release version this will be published if it's not already?

@paulomach
Copy link
Contributor Author

Hey @sombrafam , this still on the discussion. There are some edge cases we need to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mysql-k8s breaks after scale down 0
5 participants