Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scylla-manager connection issue on operator master branch when following documentation instructions #2078

Open
grzywin opened this issue Aug 16, 2024 · 10 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.

Comments

@grzywin
Copy link

grzywin commented Aug 16, 2024

What happened?

Scylla Manager is not starting on Operator master branch when following documentation instructions due to error:

{"L":"ERROR","T":"2024-08-16T07:19:04.880Z","M":"Bye","error":"no connection to database, make sure Scylla server is running and that database section in c file(s) /mnt/etc/scylla-manager/scylla-manager.yaml is set correctly: giving up after 60 attempts: dial tcp 10.110.16.67:9042: i/o timeout","_trace_id":"cK993SlrTQGcC4_I-jsc1w","errorStack":"main.init.func2\n\tgithub.com/scylladb/scylla-manager/v3/pkg/cmd/scylla-manager/root.go:114\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/[email protected]/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/[email protected]/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/[email protected]/command.go:1039\nmain.main\n\tgithub.com/scylladb/scylla-manager/v3/pkg/cmd/scylla-manager/main.go:12\nruntime.main\n\truntime/proc.go:271\nruntime.goexit\n\truntime/asm_amd64.s:1695\n","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/[email protected]/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/[email protected]/logger.go:84\nmain.init.func2.1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/cmd/scylla-manager/root.go:70\nmain.init.func2\n\tgithub.com/scylladb/scylla-manager/v3/pkg/cmd/scylla-manager/root.go:114\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/[email protected]/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/[email protected]/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/[email protected]/command.go:1039\nmain.main\n\tgithub.com/scylladb/scylla-manager/v3/pkg/cmd/scylla-manager/main.go:12\nruntime.main\n\truntime/proc.go:271"}

STARTUP ERROR: no connection to database, make sure Scylla server is running and that database section in c file(s) /mnt/etc/scylla-manager/scylla-manager.yaml is set correctly: giving up after 60 attempts: dial tcp 10.110.16.67:9042: i/o timeout

What did you expect to happen?

Manager should be up and running.

NOTE: It is working fine when I switch my branch from master to v1.13

How can we reproduce it (as minimally and precisely as possible)?

Just follow the documentation steps on Operator master branch:

  1. minikube start --cpus=6
  2. eval $(minikube docker-env)
  3. kubectl apply -f examples/common/cert-manager.yaml
  4. kubectl apply -f deploy/operator.yaml
  5. kubectl apply -f deploy/manager-prod.yaml

Scylla Operator version

docker.io/scylladb/scylla-operator:latest

Kubernetes platform name and version

Client Version: v1.30.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.0

Please attach the must-gather archive.

Must gather doesn't work for me with minikube

Anything else we need to know?

No response

@grzywin grzywin added the kind/bug Categorizes issue or PR as related to a bug. label Aug 16, 2024
@scylla-operator-bot scylla-operator-bot bot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Aug 16, 2024
@tnozicka
Copy link
Member

Manager won't come up until its backing ScyllaDB is up, so your really need to look into why ScyllaDB isn't up.

Must gather doesn't work for me with minikube

Please try to address it, it the most important thing on the whole issue.

giving up after 60 attempts

this sucks, but it's manager, not us

@tnozicka tnozicka added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Aug 16, 2024
@scylla-operator-bot scylla-operator-bot bot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Aug 16, 2024
@grzywin
Copy link
Author

grzywin commented Aug 16, 2024

Manager won't come up until its backing ScyllaDB is up, so your really need to look into why ScyllaDB isn't up.

For this particular case I am not even trying to set up ScyllaDB. It's just Operator and Manager. When I am doing exact same steps in operator branch v1.13 it is working (Manager pods are up and running without ScyllaDB).

Must gather doesn't work for me with minikube

Sure. I will take a closer look at must-gather with minikube.

@zimnx
Copy link
Collaborator

zimnx commented Aug 16, 2024

For this particular case I am not even trying to set up ScyllaDB. It's just Operator and Manager. When I am doing exact same steps in operator branch v1.13 it is working (Manager pods are up and running without ScyllaDB).

Manager requires ScyllaDB cluster to keep its internal state. Small cluster is deployed alongside Manager.

apiVersion: scylla.scylladb.com/v1

So look what happened to this cluster, as clearly it didn't boot up.

@rzetelskik
Copy link
Member

Sure. I will take a closer look at must-gather with minikube.

For the record we have an issue tracking this #1628

@rzetelskik
Copy link
Member

rzetelskik commented Aug 16, 2024

So look what happened to this cluster, as clearly it didn't boot up.

@Strasznik do you have local-csi-driver set up on the node you're trying to run Scylla Manager and its ScyllaDB cluster on?
Since 1.13 we introduced #2009 so that'd be the first thing I'd check - the generic deployment docs don't cover it.

@grzywin
Copy link
Author

grzywin commented Aug 19, 2024

Since 1.13 we introduced #2009 so that'd be the first thing I'd check - the generic deployment docs don't cover it.

@rzetelskik yea, this is it. I missed #2009 change. Are there any plans to upgrade documentation to add some more details about need of having xfs storage class?
Btw. you mentioned "Since 1.13" - On branch v1.13 it is working for me and from what I saw in the code 1.13 is still using default storage class so the 'problem' is only on master branch.

@rzetelskik
Copy link
Member

Btw. you mentioned "Since 1.13" - On branch v1.13 it is working for me and from what I saw in the code 1.13 is still using default storage class so the 'problem' is only on master branch.

I meant since the release, so not included there.

Are there any plans to upgrade documentation to add some more details about need of having xfs storage class?

I suppose as part of #1578. @tnozicka do we have any details on what the acceptance criteria would be for Rewrite docs about install flow?

@tnozicka
Copy link
Member

I suppose as part of #1578. @tnozicka do we have any details on what the acceptance criteria would be for Rewrite docs about install flow?

I'd say that's a coherent story that makes sense and feels as a whole. It may not be perfect the first time but we need a structure to build on top of. It's gonna mention the overall flow and architecture along with the storage setup.

Copy link
Contributor

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 30d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out

/lifecycle stale

@scylla-operator-bot scylla-operator-bot bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 19, 2024
Copy link
Contributor

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 30d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out

/lifecycle rotten

@scylla-operator-bot scylla-operator-bot bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Projects
None yet
Development

No branches or pull requests

4 participants