Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mysql-k8s breaks after scale down 0 #409

Closed
pedrofragola opened this issue Apr 23, 2024 · 6 comments · Fixed by #414
Closed

Mysql-k8s breaks after scale down 0 #409

pedrofragola opened this issue Apr 23, 2024 · 6 comments · Fixed by #414
Labels
bug Something isn't working

Comments

@pedrofragola
Copy link

pedrofragola commented Apr 23, 2024

Issue description:

Using the juju and charms versions below:

juju status 
Model  Controller        Cloud/Region      Version  SLA          Timestamp
test   my-k8s-localhost  my-k8s/localhost  3.2.4    unsupported  18:07:02-03:00

App             Version  Status   Scale  Charm           Channel      Rev  Address        Exposed  Message
mysql-k8s                waiting      1  mysql-k8s       8.0/stable   127  10.152.183.28  no       installing agent
mysql-test-app  0.0.2    waiting      1  mysql-test-app  latest/edge   36  10.152.183.94  no       installing agent

Unit               Workload  Agent      Address       Ports  Message
mysql-k8s/0*       unknown   executing  10.1.192.108         
mysql-test-app/0*  waiting   idle       10.1.192.107         

After scaling down the mysql-k8s to 0 and scaling up to 1 the mysql-k8s is broken on juju side, the k8s side shows the pod running:

NAME                             READY   STATUS    RESTARTS   AGE
modeloperator-847d5d9b4c-qxdxr   1/1     Running   0          9m26s
mysql-k8s-0                      2/2     Running   0          5m44s
mysql-test-app-0                 1/1     Running   0          8m37s

I was able to simulate this issue after having some app with the database relation like the app mysql-test-app-0.

DEBUG LOG:

juju deploy mysql-k8s --trust --channel 8.0/stable


unit-mysql-k8s-0: 18:07:15 ERROR unit.mysql-k8s/0.juju-log database:4: Timed out waiting for k8s service to be ready
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/src/relations/mysql_provider.py", line 124, in _on_database_requested
    self.charm.k8s_helpers.wait_service_ready((primary_endpoint, 3306))
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/tenacity/__init__.py", line 325, in iter
    raise retry_exc.reraise()
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/tenacity/__init__.py", line 158, in reraise
    raise self.last_attempt.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/src/k8s_helpers.py", line 201, in wait_service_ready
    raise TimeoutError
TimeoutError
unit-mysql-k8s-0: 18:07:15 ERROR unit.mysql-k8s/0.juju-log database:4: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/./src/charm.py", line 770, in <module>
    main(MySQLOperatorCharm)
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/ops/main.py", line 456, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/ops/main.py", line 144, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/ops/framework.py", line 351, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/ops/framework.py", line 853, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/ops/framework.py", line 943, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 1928, in _on_relation_changed_event
    getattr(self.on, "database_requested").emit(
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/ops/framework.py", line 351, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/ops/framework.py", line 853, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/ops/framework.py", line 943, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/src/relations/mysql_provider.py", line 124, in _on_database_requested
    self.charm.k8s_helpers.wait_service_ready((primary_endpoint, 3306))
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/tenacity/__init__.py", line 325, in iter
    raise retry_exc.reraise()
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/tenacity/__init__.py", line 158, in reraise
    raise self.last_attempt.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/src/k8s_helpers.py", line 201, in wait_service_ready
    raise TimeoutError
TimeoutError
unit-mysql-k8s-0: 18:07:15 ERROR juju.worker.uniter.operation hook "database-relation-changed" (via hook dispatching script: dispatch) failed: exit status 1
unit-mysql-k8s-0: 18:07:15 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook
unit-mysql-k8s-0: 18:07:25 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook
unit-mysql-k8s-0: 18:07:26 ERROR unit.mysql-k8s/0.juju-log database:4: Failed to get cluster status for cluster-7f7c9a3504e4f58a371acc55cca28b49
unit-mysql-k8s-0: 18:07:26 ERROR unit.mysql-k8s/0.juju-log database:4: Failed to get cluster endpoints
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/src/mysql_k8s_helpers.py", line 836, in update_endpoints
    rw_endpoints, ro_endpoints, offline = self.get_cluster_endpoints(get_ips=False)
  File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/lib/charms/mysql/v0/mysql.py", line 1469, in get_cluster_endpoints
    raise MySQLGetClusterEndpointsError("Failed to get endpoints from cluster status")
charms.mysql.v0.mysql.MySQLGetClusterEndpointsError: Failed to get endpoints from cluster status
unit-mysql-k8s-0: 18:07:26 WARNING unit.mysql-k8s/0.juju-log database:4: Kubernetes service already exists

So we need a way to fix the juju status.

@pedrofragola pedrofragola added the bug Something isn't working label Apr 23, 2024
Copy link
Contributor

@swasenius
Copy link

I assume there is no update on this? Just lost a database once again in kubernetes environment.

@paulomach
Copy link
Contributor

@swasenius There's some details in discussion, but support for scale up from zero will come on the linked PR.

@sombrafam
Copy link

I assume there is no update on this? Just lost a database once again in kubernetes environment.

Can you give details on why you needed to scale it down?

@swasenius
Copy link

Coming back to this, since I faced this again. If you cannot scale down a database pod, then something is not quite right.

Let's say your Kubernetes cluster needs an upgrade, it will drain the worker node. Or let's say there is a problem in your cluster and node goes down, the way I see it, there is no way you can salvage your database, unless you have taken a manual dump. (But for the manual dump you need to have the mysql password saved since you cannot fetch it while the database is down. It will be "null" if you try to fetch).

I would argue that regardless what the reason may be, you should be able to scale 0/1. Once I had a support ticket open where it was asked to stop all apps, well this broke all the mysql databases.

The cause still is that if the relation gets broken for eg between kfp-mysql-db <-> kfp-api.

@eleblebici
Copy link

Hi @paulomach ,
I obtained the server-config-password by running juju show-secret <id> --reveal --verbose but I could not bring the units back to "active" and "idle". Do you have a workaround until the permanent fix is released?
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants