-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DPE-4200 Scale-up from zero #414
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might need to change some things on router to support scaling mysql-k8s to 0 and back up
_default_unit_data_keys = { | ||
"egress-subnets", | ||
"ingress-address", | ||
"private-address", | ||
"unit-status", | ||
} | ||
return self.unit_peer_data.keys() == _default_unit_data_keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this test might be flaky if the default keys are ever changed by Juju
consider writing something to persistent disk or databag & checking that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we utilize units-added-to-cluster
here instead of relying on these keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, we can rely only on data on the storage (it can be foreign storage by user misatke, which we should not damage/erase).
if self.unit.is_leader(): | ||
# create the cluster due it being dissolved on scale-down | ||
self.create_cluster() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there's a cluster that already exists on persistent disk, will that be re-used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, charm must be blocked if foreign disk attached by mistake (to avoid data damage).
Today it is not possible for K8s, but Pedro is working with Juju on it:
> juju scale-application mysql 0
mysql scaled to 0 units
> juju add-storage mysql/0 database=foreigndisk
ERROR Juju command "add-storage" not supported on container models # will be supported on K8s like on VM
> juju scale-application mysql 3
...
Another reason to write to disk:
> juju scale-application mysql 0
# juju controller crashed... restored ...
> juju scale-application mysql 1 # did we loose units-added-to-cluster ?
if self.unit.is_leader(): | ||
# Update 'units-added-to-cluster' counter in the peer relation databag | ||
units = int(self.app_peer_data.get("units-added-to-cluster", 1)) | ||
self.app_peer_data["units-added-to-cluster"] = str(units - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also have this update to units-added-to-cluster
on the peer-relation-departed
event? wouldnt we like to decrement this value if non-leader units are scaled down?
or is this happening on update-status
(in particular _set_app_status
)? if this is the case, would there be inconsistency issues -> there are 3 units, if another unit departed and then the leader unit departed, units-added-to-cluster
will be 2 instead of 1 (if storage-detaching
runs on leader before update-status
)
_default_unit_data_keys = { | ||
"egress-subnets", | ||
"ingress-address", | ||
"private-address", | ||
"unit-status", | ||
} | ||
return self.unit_peer_data.keys() == _default_unit_data_keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we utilize units-added-to-cluster
here instead of relying on these keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but IMHO, better to discuss corner cases before merging.
_default_unit_data_keys = { | ||
"egress-subnets", | ||
"ingress-address", | ||
"private-address", | ||
"unit-status", | ||
} | ||
return self.unit_peer_data.keys() == _default_unit_data_keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, we can rely only on data on the storage (it can be foreign storage by user misatke, which we should not damage/erase).
if self.unit.is_leader(): | ||
# create the cluster due it being dissolved on scale-down | ||
self.create_cluster() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, charm must be blocked if foreign disk attached by mistake (to avoid data damage).
Today it is not possible for K8s, but Pedro is working with Juju on it:
> juju scale-application mysql 0
mysql scaled to 0 units
> juju add-storage mysql/0 database=foreigndisk
ERROR Juju command "add-storage" not supported on container models # will be supported on K8s like on VM
> juju scale-application mysql 3
...
Another reason to write to disk:
> juju scale-application mysql 0
# juju controller crashed... restored ...
> juju scale-application mysql 1 # did we loose units-added-to-cluster ?
BTW, we are adding the test to check scale-down to zero and restore with foreign disk here. Maybe it worth to copy it into MySQL VM/K8s as well? |
Testing it, it requires a remove and re-relate. On scaling in to zero, routers are cleaned up from the metadata already. |
Fields/customers do not like this... can we avoid re-relations without the real over-complication in code? |
@paulomach hi Paulo, I see that the pool request was merged. How can I know the release version this will be published if it's not already? |
Hey @sombrafam , this still on the discussion. There are some edge cases we need to fix. |
Issue
O scaling to zero, the last unit ensure the cluster is dissolved.
Scale back to unit > 0 don't detect and treat the dissolved cluster.
Solution
Fixes #409