Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image builder failed due to MountVolume.SetUp failed for volume "yatai-regcred" and "kube-api-access" #438

Open
tamle511 opened this issue Jan 16, 2023 · 4 comments

Comments

@tamle511
Copy link

Hello,
I'm trying to deploy a model to our K8S cluster. I've followed the official installation guide and set up yatai, yatai-image-builder and yatai-deployment successfully.
What I've achieved so far is to create a model and push the model to Yatai with bentoml. But now when trying to create a deployment (using Yatai UI), I've got stuck at the image builder step because the builder pod cannot be created.

Logs from Yatai:

[2023-01-16 16:41:08] [BentoDeployment] [test-onnx] [Reconciling] Starting to reconcile BentoDeployment
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [CheckingImage] Checking image exists: x.x.x.x:5000/yatai-bentos:yatai.test-onnx.0.0.1
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [CheckingImage] Image not exists: x.x.x.x:5000/yatai-bentos:yatai.test-onnx.0.0.1
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [GenerateImageBuilderPod] Making sure docker config secret yatai-regcred in namespace yatai
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [GenerateImageBuilderPod] Docker config secret yatai-regcred in namespace yatai is ready
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [GenerateImageBuilderPod] Generating image builder pod: yatai-bento-image-builder-test-onnx--0-0-1
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [GenerateImageBuilderPod] Getting bento test-onnx:0.0.1 from yatai service
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [GenerateImageBuilderPod] Got bento test-onnx:0.0.1 from yatai service
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [GenerateImageBuilderPod] Getting secret yatai-api-token in namespace yatai
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [GenerateImageBuilderPod] Secret yatai-api-token is found in namespace yatai, so updating it
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [GenerateImageBuilderPod] Secret yatai-api-token is updated in namespace yatai
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [GenerateImageBuilderPod] Getting model test-onnx:mj2hs6ut7w6udjex from yatai service
[2023-01-16 16:41:08] [BentoRequest] [test-onnx--0-0-1] [GenerateImageBuilderPod] (combined from similar events): Created image builder pod: yatai-bento-image-builder-test-onnx--0-0-1
[2023-01-16 16:41:15] [BentoRequest] [test-onnx--0-0-1] [ReconcileError] Failed to reconcile BentoRequest: image builder pod yatai-bento-image-builder-test-onnx--0-0-1 status is Failed

Pod status:

xxx@xxx:~/bentoml/yatai/helm$ kubectl get po -n yatai
NAME                                             READY   STATUS       RESTARTS   AGE
yatai-bento-image-builder-test-onnx--0-0-1   0/1     Init:Error   0          90s

Describe pod:

Events:
  Type     Reason       Age                From               Message
  ----     ------       ----               ----               -------
  Normal   Scheduled    21s                default-scheduler  Successfully assigned yatai/yatai-bento-image-builder-test-onnx--0-0-1 to node-01
  Normal   Pulled       20s                kubelet            Container image "quay.io/bentoml/bento-downloader:0.0.1" already present on machine
  Normal   Created      20s                kubelet            Created container bento-downloader
  Normal   Started      20s                kubelet            Started container bento-downloader
  Warning  FailedMount  19s (x2 over 20s)  kubelet            MountVolume.SetUp failed for volume "kube-api-access-xdddb" : object "yatai"/"kube-root-ca.crt" not registered
  Warning  FailedMount  19s (x2 over 20s)  kubelet            MountVolume.SetUp failed for volume "yatai-regcred" : object "yatai"/"yatai-regcred" not registered

I've confirmed thatkube-api-access-xdddb and yatai-regcred indeed exist so I am not sure why it says the objects were not registered.

xxx@xxx:~$ kubectl get secret -n yatai
NAME                  TYPE                                  DATA   AGE
default-token-tmczp   kubernetes.io/service-account-token   3      47h
yatai-api-token       Opaque                                1      102m
yatai-regcred         kubernetes.io/dockerconfigjson        1      142m
xxx@xxx:~$ kubectl get cm -n yatai
NAME               DATA   AGE
kube-root-ca.crt   1      47h

Kubernetes version: 1.22.8.

Could somebody please help? Thank you!

@yetone
Copy link
Member

yetone commented Jan 16, 2023

Thanks for the report! I found a related issue from the official k8s repo, and your k8s version is within the range of versions for this issue

kubernetes/kubernetes#105204

@tamle511
Copy link
Author

Thanks @yetone . I'm not sure yet if our k8s version is indeed the issue since I do not have authorities to upgrade our k8s cluster, perhaps I will try it later as our last resort.

Anyway after further debugging I've found the following error in the bento-downloader container:

xxx@xxx:~/bentoml/yatai/helm$ kubectl logs -f -n yatai yatai-bento-image-builder-test-onnx--0-0-1 bento-downloader 
Downloading bento test-onnx:0.0.1 tar file from http://yatai.yatai-system.svc.cluster.local/api/v1/bento_repositories/test-onnx/bentos/0.0.1/download to /tmp/downloaded.tar...
curl: (22) The requested URL returned error: 500

However logs from the yatai pod didn't really show anything related to the error. There were some warnings but I assume they are from some periodical checks. I also tried to call the API from another pod manually but still it didn't trigger any log messages.

xxx@xxx:~$ kubectl logs -f -n yatai-system yatai-6c564d66f5-q44pt yatai --since 5m
INFO[236524] listing unsynced deployments                  cron="sync env"
INFO[236524] updating unsynced deployments syncing_at      cron="sync env"
INFO[236524] updated unsynced deployments syncing_at       cron="sync env"
INFO[236524] syncing unsynced app deployment deployments...  cron="sync env"
INFO[236524] synced unsynced app deployment deployments...  cron="sync env"
W0117 04:23:35.090872       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:23:35.090920       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
W0117 04:23:43.873038       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:23:43.873078       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
ERRO[236558] ws read failed: "websocket: close 1005 (no status)" 
ERRO[236558] ws read failed: "websocket: close 1005 (no status)" 
ERRO[236558] ws read failed: "websocket: close 1005 (no status)" 
ERRO[236576] ws read failed: "websocket: close 1005 (no status)" 
W0117 04:24:22.328051       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:24:22.328089       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
W0117 04:24:22.833320       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:24:22.833370       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
ERRO[236589] ws read failed: "websocket: close 1005 (no status)" 
INFO[236614] listing unsynced deployments                  cron="sync env"
INFO[236614] updating unsynced deployments syncing_at      cron="sync env"
INFO[236614] updated unsynced deployments syncing_at       cron="sync env"
INFO[236614] syncing unsynced app deployment deployments...  cron="sync env"
INFO[236614] synced unsynced app deployment deployments...  cron="sync env"
W0117 04:25:02.089642       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:25:02.089689       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
W0117 04:25:14.048729       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:25:14.048766       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
W0117 04:25:49.155607       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:25:49.155654       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
W0117 04:26:01.329572       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:26:01.329626       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
INFO[236704] listing unsynced deployments                  cron="sync env"
INFO[236704] updating unsynced deployments syncing_at      cron="sync env"
INFO[236704] updated unsynced deployments syncing_at       cron="sync env"
INFO[236704] syncing unsynced app deployment deployments...  cron="sync env"
INFO[236704] synced unsynced app deployment deployments...  cron="sync env"
W0117 04:26:38.024209       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:26:38.024250       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
W0117 04:26:38.326888       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:26:38.326928       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
W0117 04:27:11.523173       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:27:11.523229       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
W0117 04:27:35.613764       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:27:35.613817       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
W0117 04:27:42.165695       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:27:42.165739       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
INFO[236794] listing unsynced deployments                  cron="sync env"
INFO[236794] updating unsynced deployments syncing_at      cron="sync env"
INFO[236794] updated unsynced deployments syncing_at       cron="sync env"
INFO[236794] syncing unsynced app deployment deployments...  cron="sync env"
INFO[236794] synced unsynced app deployment deployments...  cron="sync env"
W0117 04:28:07.787353       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"
E0117 04:28:07.787393       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:yatai-system:yatai" cannot list resource "pods" in API group "" in the namespace "yatai-builders"

@yetone
Copy link
Member

yetone commented Jan 17, 2023

Can you check the output of this command?

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: test
  namespace: yatai
spec:
  containers:
  - command:
    - sh
    - -c
    - 'curl -H "X-YATAI-API-TOKEN: yatai-image-builder:default:\$(YATAI_API_TOKEN)" "http://yatai.yatai-system.svc.cluster.local/api/v1/bento_repositories/test-onnx/bentos/0.0.1/download"'
    envFrom:
    - secretRef:
        name: yatai-api-token
    image: curlimages/curl
    name: bento-downloader
EOF

sleep 5

kubectl -n yatai logs -f test

@tamle511
Copy link
Author

Thank you. It turns out the minio endpoint was incorrect so it could not download the bento. I fixed the endpoint, re-pushed the bento and it works now. Not sure why the first push still succeeded even though the endpoint was wrong. Anyway thank you for your support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants