Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clusteradm init failed #360

Open
zhujian7 opened this issue Jul 20, 2023 · 3 comments
Open

clusteradm init failed #360

zhujian7 opened this issue Jul 20, 2023 · 3 comments
Labels

Comments

@zhujian7
Copy link
Member

zhujian7 commented Jul 20, 2023

use the latest clusteradm(curl -L https://raw.githubusercontent.com/open-cluster-management-io/clusteradm/main/install.sh | bash) to init a hub cluster failed:

╰─# /usr/local/bin/clusteradm init --bundle-version='latest' --output-join-command-file join.sh --wait
Preflight check: HubApiServer check Passed with 0 warnings and 0 errors
Preflight check: cluster-info check Passed with 0 warnings and 0 errors
CRD successfully registered.
Registration operator is now available.
⠏ Waiting for cluster manager registration to become ready...

ClusterManager registration is now available.
Error: unexpected watch event received
apiVersion: operator.open-cluster-management.io/v1
kind: ClusterManager
metadata:
  creationTimestamp: "2023-07-20T09:45:00Z"
  generation: 1
  name: cluster-manager
  resourceVersion: "494"
  uid: 59d345b6-0d8d-45e3-ac62-c8e427ad3880
spec:
  addOnManagerImagePullSpec: quay.io/open-cluster-management/addon-manager:latest
  deployOption:
    mode: Default
  placementImagePullSpec: quay.io/open-cluster-management/placement:latest
  registrationConfiguration:
    featureGates:
    - feature: DefaultClusterSet
      mode: Enable
  registrationImagePullSpec: quay.io/open-cluster-management/registration:latest
  workImagePullSpec: quay.io/open-cluster-management/work:latest
status:
  conditions:
  - lastTransitionTime: "2023-07-20T09:45:00Z"
    message: Do not support StorageVersionMigration
    reason: StorageVersionMigrationFailed
    status: "False"
    type: MigrationSucceeded

logs of cluster-manager:

# kubectl logs -f -n open-cluster-management cluster-manager-5f49d9f787-r2rcq
...
E0720 09:45:42.272303       1 base_controller.go:270] "ClusterManagerController" controller failed to sync "cluster-manager", err: clustermanagers.operator.open-cluster-management.io "cluster-manager" is forbidden: User "system:serviceaccount:open-cluster-management:cluster-manager" cannot patch resource "clustermanagers" in API group "operator.open-cluster-management.io" at the cluster scope
I0720 09:46:23.029996       1 certrotation_controller.go:137] Reconciling ClusterManager "cluster-manager"
E0720 09:46:23.032189       1 base_controller.go:270] "CertRotationController" controller failed to sync "cluster-manager", err: namespace "open-cluster-management-hub" does not exist yet
E0720 09:46:23.235966       1 base_controller.go:270] "ClusterManagerController" controller failed to sync "cluster-manager", err: clustermanagers.operator.open-cluster-management.io "cluster-manager" is forbidden: User "system:serviceaccount:open-cluster-management:cluster-manager" cannot patch resource "clustermanagers" in API group "operator.open-cluster-management.io" at the cluster scope
I0720 09:47:44.952880       1 certrotation_controller.go:137] Reconciling ClusterManager "cluster-manager"
E0720 09:47:44.955275       1 base_controller.go:270] "CertRotationController" controller failed to sync "cluster-manager", err: namespace "open-cluster-management-hub" does not exist yet
E0720 09:47:45.159655       1 base_controller.go:270] "ClusterManagerController" controller failed to sync "cluster-manager", err: clustermanagers.operator.open-cluster-management.io "cluster-manager" is forbidden: User "system:serviceaccount:open-cluster-management:cluster-manager" cannot patch resource "clustermanagers" in API group "operator.open-cluster-management.io" at the cluster scope
@zhujian7
Copy link
Member Author

/kind bug

@zhujian7
Copy link
Member Author

zhujian7 commented Jul 20, 2023

Not sure if it is possible to add some e2e like this into the clusteradm repo?
@ycyaoxdu WDYT?

@nirs
Copy link
Contributor

nirs commented Mar 31, 2024

Another instance of this error, before the ClusterManager CR was created.

This happens also in join, we need to fix the waiting code to continue waiting after unexpected events.

      drenv.commands.Error: Command failed:
         command: ('clusteradm', 'init', '--feature-gates', 'ManagedClusterAutoApproval=true', '--bundle-version', 'default', '--wait', '--context', 'hub')
         exitcode: 1
         error:
            Preflight check: HubApiServer check Passed with 0 warnings and 0 errors
            Preflight check: cluster-info check Passed with 0 warnings and 0 errors
            Error: unexpected watch event received
$ kubectl get deploy cluster-manager -n open-cluster-management --context hub -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    kubectl.kubernetes.io/last-applied-configuration: ""
  creationTimestamp: "2024-03-31T22:56:37Z"
  generation: 1
  labels:
    app: cluster-manager
  name: cluster-manager
  namespace: open-cluster-management
  resourceVersion: "653"
  uid: 70a41fc0-18b3-451e-a4d9-4d806b155f5f
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: cluster-manager
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: cluster-manager
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - cluster-manager
              topologyKey: failure-domain.beta.kubernetes.io/zone
            weight: 70
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - cluster-manager
              topologyKey: kubernetes.io/hostname
            weight: 30
      containers:
      - args:
        - /registration-operator
        - hub
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        image: quay.io/open-cluster-management/registration-operator:v0.13.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 8443
            scheme: HTTPS
          initialDelaySeconds: 2
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: registration-operator
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 8443
            scheme: HTTPS
          initialDelaySeconds: 2
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          privileged: false
          runAsNonRoot: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /tmp
          name: tmpdir
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: cluster-manager
      serviceAccountName: cluster-manager
      terminationGracePeriodSeconds: 30
      volumes:
      - emptyDir: {}
        name: tmpdir
status:
  conditions:
  - lastTransitionTime: "2024-03-31T22:56:46Z"
    lastUpdateTime: "2024-03-31T22:56:46Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: "2024-03-31T23:06:47Z"
    lastUpdateTime: "2024-03-31T23:06:47Z"
    message: ReplicaSet "cluster-manager-9d976f8d4" has timed out progressing.
    reason: ProgressDeadlineExceeded
    status: "False"
    type: Progressing
  observedGeneration: 1
  replicas: 1
  unavailableReplicas: 1
  updatedReplicas: 1
$ kubectl get ClusterManager -n open-cluster-management --context hub
No resources found

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants