Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not restore from backup #543

Closed
eleblebici opened this issue Dec 17, 2024 · 3 comments
Closed

Can not restore from backup #543

eleblebici opened this issue Dec 17, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@eleblebici
Copy link

Steps to reproduce

  1. Deploy CKF in version 1.8
  2. Deploy s3-integrator
  3. Integrate kfp-db, katib-db with s3-integrator
  4. Backup the kfp-db, katib-db databases
  5. CKF upgraded to 1.9 (mysql-k8s apps not upgraded or refreshed)
  6. Scaled down mysql to 0 and then scaled up to 1 back. Then hit the issue mentioned here
  7. Redeployed katib-db, integrated with s3-integrator
  8. Restored the backup and the unit in "blocked" state now.

Expected behavior

The restore should be successfull and the unit should be in "active" and "idle" state.

Actual behavior

$ juju run katib-db/0 list-backups
Running operation 73 with 1 task
  - task 74 on unit-katib-db-0

Waiting for task 74...
backups: |-
  backup-id             | backup-type  | backup-status
  ----------------------------------------------------
  2024-12-11T10:51:42Z  | physical     | finished
  2024-12-11T10:52:17Z  | physical     | finished
  2024-12-11T10:53:02Z  | physical     | finished
$ juju run katib-db/0 restore backup-id=2024-12-11T10:52:17Z
Running operation 79 with 1 task
  - task 80 on unit-katib-db-0

Waiting for task 80...
ERROR timed out waiting for results from: unit katib-db/0

Stuck in "blocked" state:

$ juju status katib-db
Model     Controller  Cloud/Region      Version  SLA          Timestamp
kubeflow  uk8sx       my-k8s/localhost  3.4.6    unsupported  11:32:33Z

App       Version                  Status   Scale  Charm      Channel     Rev  Address         Exposed  Message
katib-db  8.0.37-0ubuntu0.22.04.3  waiting      1  mysql-k8s  8.0/stable  180  10.152.183.173  no       installing agent

Unit         Workload  Agent  Address      Ports  Message
katib-db/0*  blocked   idle   10.1.33.214         Failed to start mysqld

debug-log:

$ juju debug-log --include katib-db
unit-katib-db-0: 09:58:10 ERROR unit.katib-db/0.juju-log Failed to connect to MySQL with mysqlsh
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-katib-db-0/charm/src/mysql_k8s_helpers.py", line 616, in _run_mysqlsh_script
    stdout, _ = process.wait_output()
  File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/pebble.py", line 1771, in wait_output
    raise ExecError[AnyStr](self._command, exit_code, out_value, err_value)
ops.pebble.ExecError: non-zero exit code 1 executing ['/usr/bin/mysqlsh', '--no-wizard', '--python', '--verbose=1', '-f', '/tmp/script.py', ';', 'rm', '/tmp/script.py'], stdout='', stderr='Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory\nverbose: 2024-12-17T09:58:09Z: Loading startup files...\nverbose: 2024-12-17T09:58:09Z: Loading plugins...\nverbose: 2024-12-17T09:58:09Z: Connecting to MySQL at: serverconfig@katib-db-0.katib-db-endpoints.kubeflow.svc.cluster.local\nTraceback (most recent call last):\n  File "<string>", line 1, in <module>\nmysqlsh.DBError: MySQL Error (1045): Shell.connect: Access denied for user \'serverconfig\'@\'katib-db-0.katib-db-endpoints.kubeflow.svc.cluster.local\' (using password: YES)\n'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-katib-db-0/charm/lib/charms/mysql/v0/mysql.py", line 2932, in check_mysqlsh_connection
    self._run_mysqlsh_script("\n".join(connect_commands))
  File "/var/lib/juju/agents/unit-katib-db-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-katib-db-0/charm/src/mysql_k8s_helpers.py", line 619, in _run_mysqlsh_script
    raise MySQLClientError
charms.mysql.v0.mysql.MySQLClientError
unit-katib-db-0: 09:58:14 ERROR unit.katib-db/0.juju-log Failed to connect to MySQL with mysqlsh
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-katib-db-0/charm/src/mysql_k8s_helpers.py", line 616, in _run_mysqlsh_script
    stdout, _ = process.wait_output()
  File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/pebble.py", line 1771, in wait_output
    raise ExecError[AnyStr](self._command, exit_code, out_value, err_value)
ops.pebble.ExecError: non-zero exit code 1 executing ['/usr/bin/mysqlsh', '--no-wizard', '--python', '--verbose=1', '-f', '/tmp/script.py', ';', 'rm', '/tmp/script.py'], stdout='', stderr='Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory\nverbose: 2024-12-17T09:58:13Z: Loading startup files...\nverbose: 2024-12-17T09:58:13Z: Loading plugins...\nverbose: 2024-12-17T09:58:13Z: Connecting to MySQL at: serverconfig@katib-db-0.katib-db-endpoints.kubeflow.svc.cluster.local\nTraceback (most recent call last):\n  File "<string>", line 1, in <module>\nmysqlsh.DBError: MySQL Error (1045): Shell.connect: Access denied for user \'serverconfig\'@\'katib-db-0.katib-db-endpoints.kubeflow.svc.cluster.local\' (using password: YES)\n'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-katib-db-0/charm/lib/charms/mysql/v0/mysql.py", line 2932, in check_mysqlsh_connection
    self._run_mysqlsh_script("\n".join(connect_commands))
  File "/var/lib/juju/agents/unit-katib-db-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-katib-db-0/charm/src/mysql_k8s_helpers.py", line 619, in _run_mysqlsh_script
    raise MySQLClientError
charms.mysql.v0.mysql.MySQLClientError
unit-katib-db-0: 09:58:17 ERROR unit.katib-db/0.juju-log Failed to connect to MySQL with mysqlsh
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-katib-db-0/charm/src/mysql_k8s_helpers.py", line 616, in _run_mysqlsh_script
    stdout, _ = process.wait_output()
  File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/pebble.py", line 1771, in wait_output
    raise ExecError[AnyStr](self._command, exit_code, out_value, err_value)
ops.pebble.ExecError: non-zero exit code 1 executing ['/usr/bin/mysqlsh', '--no-wizard', '--python', '--verbose=1', '-f', '/tmp/script.py', ';', 'rm', '/tmp/script.py'], stdout='', stderr='Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory\nverbose: 2024-12-17T09:58:17Z: Loading startup files...\nverbose: 2024-12-17T09:58:17Z: Loading plugins...\nverbose: 2024-12-17T09:58:17Z: Connecting to MySQL at: serverconfig@katib-db-0.katib-db-endpoints.kubeflow.svc.cluster.local\nTraceback (most recent call last):\n  File "<string>", line 1, in <module>\nmysqlsh.DBError: MySQL Error (1045): Shell.connect: Access denied for user \'serverconfig\'@\'katib-db-0.katib-db-endpoints.kubeflow.svc.cluster.local\' (using password: YES)\n'

Got the serverconfig password and tried to run mysqlsh manually:

$ juju show-secret --reveal ctgi1dfmp25c74stsi6g --verbose
ctgi1dfmp25c74stsi6g:
  revision: 1
  owner: katib-db
  label: database-peers.katib-db.app
  created: 2024-12-17T06:56:55Z
  updated: 2024-12-17T06:56:55Z
  content:
    backups-password: r2iDCqqzdvinYjv2znhWov8p
    cluster-admin-password: bk7vrL2Qk3cF3ZiEc77e0p13
    monitoring-password: rMOq9U6AtsyS2gIl4mJA2HaI
    root-password: OH2zCSLpiQi1FxhVDIGzjjgw
    server-config-password: Iz42koEHwvS8FUNLoTMdYVZ2

It also gives access denied:

$ juju ssh --container mysql katib-db/0 mysqlsh --quiet-start=2 --py serverconfig:Iz42koEHwvS8FUNLoTMdYVZ2@localhost
Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
MySQL Error 1045: Access denied for user 'serverconfig'@'localhost' (using password: YES)
ERROR command terminated with exit code 1

Versions

Operating system: Ubuntu

Juju CLI: 3.4.6

Juju agent: 3.4.6

Charm revision: 8.0/stable revision 180

microk8s: 1.29.11

Log output

Juju debug log:
log.txt

Additional context

@eleblebici eleblebici added the bug Something isn't working label Dec 17, 2024
Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/DPE-6243.

This message was autogenerated

Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/DPE-6244.

This message was autogenerated

@shayancanonical
Copy link
Contributor

The reason we are facing access denied issues is because the passwords in the backup database (taken on a different juju application) are different than the new deployment of katib-db (again, on a different juju application). We have documented the need to retrieve passwords from the old juju application to restore a backup taken on a different juju application here: https://charmhub.io/mysql-k8s/docs/h-migrate-cluster

Closing, as the access denied errors should be expected in case a backup is taken on a different application than the one where the backup is being restored and the passwords do not match

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants