Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Charm never recovers when one unit is offline with two units waiting to join cluster #530

Open
shayancanonical opened this issue Nov 18, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@shayancanonical
Copy link
Contributor

Steps to reproduce

  1. juju deploy -n 3 mysql-k8s --channel 8.0/edge
  2. wait until the first unit is online, and then run microk8s.kubectl -n model-name delete pod mysql-k8s-0 as soon as it goes online
  3. wait until the unit comes back online

Expected behavior

The cluster should be able to recover performing a full-cluster crash recovery (even though there is only one member in the cluster). The two waiting units should not be considered as they are yet to be a part of the cluster.

Actual behavior

The cluster is stuck with one unit in offline and two units in waiting status

nova-mysql/0*                maintenance  idle   10.1.28.217          offline
nova-mysql/1                 waiting      idle   10.1.180.21          waiting to get cluster primary from peers
nova-mysql/2                 waiting      idle   10.1.190.214         waiting to get cluster primary from peers

Versions

Operating system: Ubuntu 22.04 LTS

Juju CLI: 3.5.4

Juju agent: 3.5.4

Charm revision: 180

Log output

Juju debug log:

unit-nova-mysql-0: 10:39:54 INFO unit.nova-mysql/0.juju-log Persisting configuration changes to file
unit-nova-mysql-0: 10:39:54 INFO unit.nova-mysql/0.juju-log Configuration change requires restart
unit-nova-mysql-0: 11:39:55 ERROR unit.nova-mysql/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/./src/charm.py", line 888, in <module>
    main(MySQLOperatorCharm)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/main.py", line 551, in main
    manager.run()
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/main.py", line 530, in run
    self._emit()
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/main.py", line 519, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/main.py", line 147, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 348, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 860, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 950, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/./src/charm.py", line 536, in _on_config_changed
    self.on[f"{self.restart.name}"].acquire_lock.emit()
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 348, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 860, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 950, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/rolling_ops/v0/rollingops.py", line 399, in _on_acquire_lock
    self.charm.on[self.name].relation_changed.emit(relation, app=self.charm.app)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 348, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 860, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 950, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/rolling_ops/v0/rollingops.py", line 348, in _on_relation_changed
    self.charm.on[self.name].process_locks.emit()
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 348, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 860, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 950, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/rolling_ops/v0/rollingops.py", line 384, in _on_process_locks
    self.charm.on[self.name].run_with_lock.emit()
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 348, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 860, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 950, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/rolling_ops/v0/rollingops.py", line 415, in _on_run_with_lock
    callback(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/./src/charm.py", line 449, in _restart
    container.pebble.restart_services([MYSQLD_SAFE_SERVICE], timeout=3600)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/pebble.py", line 2201, in restart_services
    return self._services_action('restart', services, timeout, delay)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/pebble.py", line 2224, in _services_action
    change = self.wait_change(change_id, timeout=timeout, delay=delay)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/pebble.py", line 2254, in wait_change
    return self._wait_change_using_wait(change_id, timeout)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/pebble.py", line 2282, in _wait_change_using_wait
    raise TimeoutError(f'timed out waiting for change {change_id} ({timeout} seconds)')
ops.pebble.TimeoutError: timed out waiting for change 471 (3600 seconds)
unit-nova-mysql-0: 11:39:55 ERROR juju.worker.uniter.operation hook "config-changed" (via hook dispatching script: dispatch) failed: exit status 1
unit-nova-mysql-0: 11:39:57 INFO juju.worker.uniter awaiting error resolution for "config-changed" hook
unit-nova-mysql-0: 11:40:02 INFO juju.worker.uniter awaiting error resolution for "config-changed" hook
unit-nova-mysql-0: 11:40:03 INFO juju.worker.uniter.operation ran "config-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 11:40:04 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 11:40:05 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 11:40:06 INFO juju.worker.uniter.operation ran "database-peers-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 11:40:07 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 11:40:08 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 11:40:10 ERROR unit.nova-mysql/0.juju-log database-peers:29: Failed to get cluster status for cluster-b65b6fff3ec3a31de1d455381cc8497a
unit-nova-mysql-0: 11:40:10 ERROR unit.nova-mysql/0.juju-log database-peers:29: Failed to get cluster endpoints
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/src/mysql_k8s_helpers.py", line 786, in update_endpoints
    rw_endpoints, ro_endpoints, offline = self.get_cluster_endpoints(get_ips=False)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 1872, in get_cluster_endpoints
    raise MySQLGetClusterEndpointsError("Failed to get endpoints from cluster status")
charms.mysql.v0.mysql.MySQLGetClusterEndpointsError: Failed to get endpoints from cluster status
unit-nova-mysql-0: 11:40:10 ERROR unit.nova-mysql/0.juju-log database-peers:29: Failed to get cluster status for cluster-b65b6fff3ec3a31de1d455381cc8497a
unit-nova-mysql-0: 11:40:10 ERROR unit.nova-mysql/0.juju-log database-peers:29: Failed to get cluster endpoints
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/src/mysql_k8s_helpers.py", line 786, in update_endpoints
    rw_endpoints, ro_endpoints, offline = self.get_cluster_endpoints(get_ips=False)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 1872, in get_cluster_endpoints
    raise MySQLGetClusterEndpointsError("Failed to get endpoints from cluster status")
charms.mysql.v0.mysql.MySQLGetClusterEndpointsError: Failed to get endpoints from cluster status
unit-nova-mysql-0: 11:40:10 ERROR unit.nova-mysql/0.juju-log database-peers:29: Failed to get cluster status for cluster-b65b6fff3ec3a31de1d455381cc8497a
unit-nova-mysql-0: 11:40:10 ERROR unit.nova-mysql/0.juju-log database-peers:29: Failed to get cluster endpoints
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/src/mysql_k8s_helpers.py", line 786, in update_endpoints
    rw_endpoints, ro_endpoints, offline = self.get_cluster_endpoints(get_ips=False)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 1872, in get_cluster_endpoints
    raise MySQLGetClusterEndpointsError("Failed to get endpoints from cluster status")
charms.mysql.v0.mysql.MySQLGetClusterEndpointsError: Failed to get endpoints from cluster status
unit-nova-mysql-0: 11:40:11 INFO juju.worker.uniter.operation ran "database-peers-relation-changed" hook (via hook dispatching script: dispatch)

Additional context

@shayancanonical shayancanonical added the bug Something isn't working label Nov 18, 2024
Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/DPE-6039.

This message was autogenerated

@jneo8
Copy link

jneo8 commented Dec 17, 2024

I also encounter this issue when deploy multi-nodes sunbeam with 2024.1/edge resize step, mysql revision 180.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants