Scylla-manager connection issue on operator master branch when following documentation instructions #2078

grzywin · 2024-08-16T07:35:01Z

What happened?

Scylla Manager is not starting on Operator master branch when following documentation instructions due to error:

{"L":"ERROR","T":"2024-08-16T07:19:04.880Z","M":"Bye","error":"no connection to database, make sure Scylla server is running and that database section in c file(s) /mnt/etc/scylla-manager/scylla-manager.yaml is set correctly: giving up after 60 attempts: dial tcp 10.110.16.67:9042: i/o timeout","_trace_id":"cK993SlrTQGcC4_I-jsc1w","errorStack":"main.init.func2\n\tgithub.com/scylladb/scylla-manager/v3/pkg/cmd/scylla-manager/root.go:114\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/[email protected]/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/[email protected]/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/[email protected]/command.go:1039\nmain.main\n\tgithub.com/scylladb/scylla-manager/v3/pkg/cmd/scylla-manager/main.go:12\nruntime.main\n\truntime/proc.go:271\nruntime.goexit\n\truntime/asm_amd64.s:1695\n","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/[email protected]/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/[email protected]/logger.go:84\nmain.init.func2.1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/cmd/scylla-manager/root.go:70\nmain.init.func2\n\tgithub.com/scylladb/scylla-manager/v3/pkg/cmd/scylla-manager/root.go:114\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/[email protected]/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/[email protected]/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/[email protected]/command.go:1039\nmain.main\n\tgithub.com/scylladb/scylla-manager/v3/pkg/cmd/scylla-manager/main.go:12\nruntime.main\n\truntime/proc.go:271"}

STARTUP ERROR: no connection to database, make sure Scylla server is running and that database section in c file(s) /mnt/etc/scylla-manager/scylla-manager.yaml is set correctly: giving up after 60 attempts: dial tcp 10.110.16.67:9042: i/o timeout

What did you expect to happen?

Manager should be up and running.

NOTE: It is working fine when I switch my branch from master to v1.13

How can we reproduce it (as minimally and precisely as possible)?

Just follow the documentation steps on Operator master branch:

minikube start --cpus=6
eval $(minikube docker-env)
kubectl apply -f examples/common/cert-manager.yaml
kubectl apply -f deploy/operator.yaml
kubectl apply -f deploy/manager-prod.yaml

Scylla Operator version

docker.io/scylladb/scylla-operator:latest

Kubernetes platform name and version

Client Version: v1.30.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.0

Please attach the must-gather archive.

Must gather doesn't work for me with minikube

Anything else we need to know?

No response

The text was updated successfully, but these errors were encountered:

tnozicka · 2024-08-16T08:14:04Z

Manager won't come up until its backing ScyllaDB is up, so your really need to look into why ScyllaDB isn't up.

Must gather doesn't work for me with minikube

Please try to address it, it the most important thing on the whole issue.

giving up after 60 attempts

this sucks, but it's manager, not us

grzywin · 2024-08-16T09:15:06Z

Manager won't come up until its backing ScyllaDB is up, so your really need to look into why ScyllaDB isn't up.

For this particular case I am not even trying to set up ScyllaDB. It's just Operator and Manager. When I am doing exact same steps in operator branch v1.13 it is working (Manager pods are up and running without ScyllaDB).

Must gather doesn't work for me with minikube

Sure. I will take a closer look at must-gather with minikube.

zimnx · 2024-08-16T09:47:44Z

For this particular case I am not even trying to set up ScyllaDB. It's just Operator and Manager. When I am doing exact same steps in operator branch v1.13 it is working (Manager pods are up and running without ScyllaDB).

Manager requires ScyllaDB cluster to keep its internal state. Small cluster is deployed alongside Manager.

scylla-operator/deploy/manager/prod/50_scyllacluster.yaml

Line 1 in 568554b

apiVersion: scylla.scylladb.com/v1

So look what happened to this cluster, as clearly it didn't boot up.

rzetelskik · 2024-08-16T10:36:28Z

Sure. I will take a closer look at must-gather with minikube.

For the record we have an issue tracking this #1628

rzetelskik · 2024-08-16T10:41:11Z

So look what happened to this cluster, as clearly it didn't boot up.

@Strasznik do you have local-csi-driver set up on the node you're trying to run Scylla Manager and its ScyllaDB cluster on?
Since 1.13 we introduced #2009 so that'd be the first thing I'd check - the generic deployment docs don't cover it.

grzywin · 2024-08-19T09:27:19Z

Since 1.13 we introduced #2009 so that'd be the first thing I'd check - the generic deployment docs don't cover it.

@rzetelskik yea, this is it. I missed #2009 change. Are there any plans to upgrade documentation to add some more details about need of having xfs storage class?
Btw. you mentioned "Since 1.13" - On branch v1.13 it is working for me and from what I saw in the code 1.13 is still using default storage class so the 'problem' is only on master branch.

rzetelskik · 2024-08-19T10:22:48Z

Btw. you mentioned "Since 1.13" - On branch v1.13 it is working for me and from what I saw in the code 1.13 is still using default storage class so the 'problem' is only on master branch.

I meant since the release, so not included there.

Are there any plans to upgrade documentation to add some more details about need of having xfs storage class?

I suppose as part of #1578. @tnozicka do we have any details on what the acceptance criteria would be for Rewrite docs about install flow?

tnozicka · 2024-08-19T12:41:41Z

I suppose as part of #1578. @tnozicka do we have any details on what the acceptance criteria would be for Rewrite docs about install flow?

I'd say that's a coherent story that makes sense and feels as a whole. It may not be perfect the first time but we need a structure to build on top of. It's gonna mention the overall flow and architecture along with the storage setup.

scylla-operator-bot · 2024-09-19T10:58:54Z

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 30d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out

/lifecycle stale

scylla-operator-bot · 2024-10-20T10:54:43Z

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 30d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out

/lifecycle rotten

grzywin added the kind/bug Categorizes issue or PR as related to a bug. label Aug 16, 2024

scylla-operator-bot bot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Aug 16, 2024

tnozicka added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Aug 16, 2024

scylla-operator-bot bot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Aug 16, 2024

scylla-operator-bot bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 19, 2024

scylla-operator-bot bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scylla-manager connection issue on operator master branch when following documentation instructions #2078

Scylla-manager connection issue on operator master branch when following documentation instructions #2078

grzywin commented Aug 16, 2024 •

edited

Loading

tnozicka commented Aug 16, 2024

grzywin commented Aug 16, 2024

zimnx commented Aug 16, 2024

rzetelskik commented Aug 16, 2024

rzetelskik commented Aug 16, 2024 •

edited

Loading

grzywin commented Aug 19, 2024 •

edited

Loading

rzetelskik commented Aug 19, 2024

tnozicka commented Aug 19, 2024

scylla-operator-bot bot commented Sep 19, 2024

scylla-operator-bot bot commented Oct 20, 2024

Scylla-manager connection issue on operator master branch when following documentation instructions #2078

Scylla-manager connection issue on operator master branch when following documentation instructions #2078

Comments

grzywin commented Aug 16, 2024 • edited Loading

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Scylla Operator version

Kubernetes platform name and version

Please attach the must-gather archive.

Anything else we need to know?

tnozicka commented Aug 16, 2024

grzywin commented Aug 16, 2024

zimnx commented Aug 16, 2024

rzetelskik commented Aug 16, 2024

rzetelskik commented Aug 16, 2024 • edited Loading

grzywin commented Aug 19, 2024 • edited Loading

rzetelskik commented Aug 19, 2024

tnozicka commented Aug 19, 2024

scylla-operator-bot bot commented Sep 19, 2024

scylla-operator-bot bot commented Oct 20, 2024

grzywin commented Aug 16, 2024 •

edited

Loading

rzetelskik commented Aug 16, 2024 •

edited

Loading

grzywin commented Aug 19, 2024 •

edited

Loading