Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to load docker desktop containerd managed images to cluster #3795

Open
iamvinov-atlassian opened this issue Nov 20, 2024 · 16 comments
Open
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@iamvinov-atlassian
Copy link

What happened:
I am attempting to load Docker images into kind cluster: kind load docker-image busybox -n nebulae but getting the following error:
❯ kind load docker-image busybox -n nebulae Image: "busybox" with ID "sha256:5b0f33c83a97f5f7d12698df6732098b0cdb860d377f6307b68efe2c6821296f" not yet present on node "nebulae-control-plane", loading... ERROR: failed to load image: command "docker exec --privileged -i nebulae-control-plane ctr --namespace=k8s.io images import --all-platforms --digests --snapshotter=overlayfs -" failed with error: exit status 1 Command Output: ctr: content digest sha256:83e82a8dd385e27d95f2118c1332d414684aa665552f7f837f86da33674308c4: not found

What you expected to happen:
I expected the image to load successfully. I already have this image pulled locally using docker pull busybox. Upon further investigation, it seems to me that kind or (containerd) expects the image for all platforms to be present on the host for the load command to succeed.

How to reproduce it (as minimally and precisely as possible):

docker pull busybox
kind create cluster --name nebulae
kind load -v 10 docker-image -n nebulae busybox

Anything else we need to know?:
From looking at other answers on the internet, it seems generally this error occurs when the image arch doesn't match the host arch. But this is not the case. I did perform docker images --tree and made sure the images match my host (M3 MacBook Pro) OS.

busybox:latest                                                                         5b0f33c83a97       12.6MB            4MB
├─ linux/arm64/v8                                                                      6ca1ac3927a1       6.02MB         1.85MB
├─ linux/amd64                                                                         a3e1b257b47c       6.56MB         2.16MB
├─ linux/arm/v5                                                                        3076001161ce           0B             0B
├─ linux/arm/v6                                                                        a9fc789b4096           0B             0B
├─ linux/arm/v7                                                                        fb632082f5cb           0B             0B
├─ linux/386                                                                           c0d2f0e7a91f           0B             0B
├─ linux/mips64le                                                                      0e1d386b0b5d           0B             0B
├─ linux/ppc64le                                                                       fc082c5fdd21           0B             0B
├─ linux/riscv64                                                                       d55b3027f77f           0B             0B
└─ linux/s390x                                                                         4bc8b19fe938           0B             0B

Environment:

  • kind version: kind v0.25.0 go1.23.3 darwin/arm64
  • Runtime info: (use docker info, podman info or nerdctl info):
❯ docker info
Client:
 Version:    27.3.1
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  ai: Ask Gordon - Docker Agent (Docker Inc.)
    Version:  v0.1.0
    Path:     /Users/vvelu/.docker/cli-plugins/docker-ai
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.18.0-desktop.2
    Path:     /Users/vvelu/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.30.3-desktop.1
    Path:     /Users/vvelu/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.37
    Path:     /Users/vvelu/.docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Alpha) (Docker Inc.)
    Version:  v0.0.15
    Path:     /Users/vvelu/.docker/cli-plugins/docker-desktop
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     /Users/vvelu/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.27
    Path:     /Users/vvelu/.docker/cli-plugins/docker-extension
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version:  v1.0.5
    Path:     /Users/vvelu/.docker/cli-plugins/docker-feedback
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v1.4.0
    Path:     /Users/vvelu/.docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /Users/vvelu/.docker/cli-plugins/docker-sbom
  scout: Docker Scout (Docker Inc.)
    Version:  v1.15.0
    Path:     /Users/vvelu/.docker/cli-plugins/docker-scout

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 21
 Server Version: 27.3.1
 Storage Driver: overlayfs
  driver-type: io.containerd.snapshotter.v1
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 472731909fa34bd7bc9c087e4c27943f9835f111
 runc version: v1.1.13-0-g58aa920
 init version: de40ad0
 Security Options:
  seccomp
   Profile: unconfined
  cgroupns
 Kernel Version: 6.10.14-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 12
 Total Memory: 7.653GiB
 Name: docker-desktop
 ID: 794edb33-e6f7-4749-8c5c-edf7b3d5cf21
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Labels:
  com.docker.desktop.address=unix:///Users/vvelu/Library/Containers/com.docker.docker/Data/docker-cli.sock
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: daemon is not using the default seccomp profile
  • OS: MacOS Sonoma v14.7
  • Kubernetes version: (use kubectl version):
❯ kubectl version
Client Version: v1.31.1
Kustomize Version: v5.4.2
Server Version: v1.31.2
  • Any proxies or other special environment settings?:
@iamvinov-atlassian iamvinov-atlassian added the kind/bug Categorizes issue or PR as related to a bug. label Nov 20, 2024
@iamvinov-atlassian
Copy link
Author

More information. Unticking: "Use containerd for pulling and storing images" in Docker Desktop actually resolves this.
Screenshot 2024-11-20 at 14 29 21

@BenTheElder
Copy link
Member

That error message is coming from ctr when we ask it to import the image saved from docker.

Unfortunately I can't run docker desktop at work, will have to find another way to reproduce this.

Can you look at the same image exported the way kind does with docker save,

commandArgs := append([]string{"save", "-o", dest}, images...)

or provide a tarball from that somewhere? That would speed things up (can replicate the rest with kind load image-archive)

this sounds like a containerd/docker bug but we need to confirm how before contacting them. The part kind is doing could be bugged but is pretty straightforward once we decide we need to load the image because it's not already available

@BenTheElder BenTheElder changed the title Unable to load images to kind cluster Unable to load docker desktop containerd managed images to cluster Nov 20, 2024
@BenTheElder
Copy link
Member

Could be a small sample image like busybox if you can confirm the bug still applies to that image and share the containerd vs dockerd mode versions that would speed things up. Otherwise it may be difficult to reproduce due to the licensing of the application and/or my employer's policies, I'll have to see if this is something I can replicate in some other way.

@porridge
Copy link

I'm hitting what seems to be the same issue in kuttl's integration tests. Kuttl embeds kind, currently v0.25.0.

Interestingly this works on CI (GHA, ubuntu 20.04 runner) but on my desktop this fails with the same message as for @iamvinov-atlassian

What I've been able to figure out using skopeo inspect --raw docker://docker.io/library/busybox:latest|jq . and docker image save docker.io/library/busybox:latest is that:

  • the digest that ctr complains about is claimed to be an attestation-manifest for the linux/amd64 manifest
  • the busybox' docker image mentions it in the index image, but the blob itself is nowhere to be found

So it seems like from the PoV of ctr the image is incomplete since it's lacking the attestation blob. FWIW, here is how the integration test fetches and loads the image.

I'm running:

[kuttl]$ docker info
Client:
 Version:    27.3.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.13.0
    Path:     /home/mowsiany/.docker/cli-plugins/docker-buildx

Server:
 Containers: 20
  Running: 0
  Paused: 0
  Stopped: 20
 Images: 3
 Server Version: 27.3.1
 Storage Driver: overlayfs
  driver-type: io.containerd.snapshotter.v1
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: /usr/bin/tini-static
 containerd version: 2.fc41
 runc version: 
 init version: 
 Security Options:
  seccomp
   Profile: builtin
  selinux
  cgroupns
 Kernel Version: 6.11.7-300.fc41.x86_64
 Operating System: Fedora Linux 41 (Workstation Edition)
 OSType: linux
 Architecture: x86_64
 CPUs: 20
 Total Memory: 62.5GiB
 Name: mowsiany-thinkpadp1gen5.remote.csb
 ID: e8f36c79-610a-4647-8cc3-b734cebd7050
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: porridgerox
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

@porridge
Copy link

@BenTheElder you might be able to reproduce this with:

git clone https://github.com/kudobuilder/kuttl
cd kuttl
make envtest
KUBEBUILDER_ASSETS=$(./bin/setup-envtest use 1.25.0 --bin-dir `pwd`/bin -p path) go test -tags integration ./pkg/test -v -mod=readonly -test.run TestAddContainers

Unfortunately the actual error is hidden as sigs.k8s.io/kind/pkg/cluster/nodeutils.LoadImageArchive is missing .SetStdout(os.Stdout).SetStderr(os.Stderr) at least in the version we use.

@dgl
Copy link
Contributor

dgl commented Dec 3, 2024

We came across similar in our environment, we're not using docker desktop but have configured Docker to use containerd as the image store per these docs.

I'm not sure if this is exactly the same as what @porridge reports, as it doesn't involve an attestation blob. However I can simply reproduce this with:

$ docker save -o nginx.tar nginx:1.27.0
$ sudo ctr images import nginx.tar
ctr: content digest sha256:87c2c53ae6565cc48341389169745670320a22d39014ce861661e986e983342c: not found
Versions:
$ docker version
Client: Docker Engine - Community
 Version:           27.3.1
 API version:       1.47
 Go version:        go1.22.7
 Git commit:        ce12230
 Built:             Fri Sep 20 11:40:59 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          27.3.1
  API version:      1.47 (minimum version 1.24)
  Go version:       go1.22.7
  Git commit:       41ca978
  Built:            Fri Sep 20 11:40:59 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.24
  GitCommit:        88bf19b2105c8b17560993bee28a01ddc2f97182
 runc:
  Version:          1.2.2
  GitCommit:        v1.2.2-0-g7cb3632
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

$ sudo ctr version
Client:
  Version:  1.7.24
  Revision: 88bf19b2105c8b17560993bee28a01ddc2f97182
  Go version: go1.22.9

Server:
  Version:  1.7.24
  Revision: 88bf19b2105c8b17560993bee28a01ddc2f97182
  UUID: f0862135-af1a-494b-b111-192071709ee5

Note this does need Docker that has multi-platform support, so I don't see this on Docker 24 on another system. I've not checked all versions but definitely it happens with Docker 27 on the host.

For the nginx image this happens because there's a reference to something for platform 386, which in some cases matches amd64 (containerd's platform matching code).

Therefore a workaround is:

$ sudo ctr images import --platform amd64 nginx.tar
unpacking docker.io/library/nginx:1.27.0 (sha256:98f8ec75657d21b924fe4f69b6b9bff2f6550ea48838af479d8894a852000e40)...done

Obviously that's not ideal, because #2957 wanted --all-platforms (and note it happens even if --platform isn't specified, as the default is a mix of amd64/386). However I happened to notice ctr from containerd 2.0 doesn't exhibit this behaviour and bisected it to: containerd/containerd@eb123db

So that means --local=false works as a partial workaround and indeed that does work on the version of containerd kind is currently using. If anyone is experiencing this I made #3805 as a potential workaround -- that's not ready to be merged, in particular I haven't got a way to test this on Docker Desktop, but testing would be useful.

@BenTheElder
Copy link
Member

Do we know why this works? Is it potentially fetching references from a remote (and therefore broken airgapped)?

@dgl
Copy link
Contributor

dgl commented Dec 5, 2024

@BenTheElder It's not fetching from a remote --local=false means it is using the containerd transfer service, which is a different implementation (more is done inside containerd than inside ctr).

cc @AkihiroSuda -- could you help decide if this is a containerd bug or a docker issue? i.e. is docker generating a bad OCI image or is containerd mishandling it? In order to reproduce this all that is needed is a Docker instance configured to use containerd (CE edition is fine), then run docker save and try to import that image to containerd using ctr (steps in this comment).

There are three potential issues here:

  1. Docker sets containerd.WithSkipMissing but ctr doesn't.
  2. The transfer service code path doesn't actually check skip missing, but also doesn't error on missing references in some cases (when there is a mixture of platforms like amd64 / 386). I haven't followed the full code flow but I did notice this todo in containerd which is maybe related.
  3. Docker generates the exported image with OnlyStrict, Containerd imports the image with Only (I suspect there's other places, I think that's the one that applies for transfer service).

(2 is why --local=false can workaround this, I think.)

@AkihiroSuda
Copy link
Member

The transfer service code path doesn't actually check skip missing

Seems to be a bug of containerd.
Any image that can be imported with non-transfer API should be still importable with the transfer API.

@BenTheElder
Copy link
Member

We should circle back to see if it's still a bug after upgrading, I am working on shipping v1.7.24 (current 1.7.x)

@BenTheElder
Copy link
Member

At HEAD the default node image is on Kubernetes 1.32.0 + containerd 1.7.24, there are a lot of changes in containerd and I wonder if any of them fixed the bug.

#3768 tracks containerd 2.0, that one is a bigger can of worms for our users and might be a bit, but also AIUI the fix isn't necessarily in 2.0 if not in 1.7, it's just that this particular issue can be avoided by using the transfer service, but we should also report a bug to containerd if it's still present in current releases without using the transfer service.

@porridge
Copy link

porridge commented Dec 16, 2024

We should circle back to see if it's still a bug after upgrading, I am working on shipping v1.7.24 (current 1.7.x)

@BenTheElder do I need to upgrade anything on my workstation to test whether the issue is fixed in my case? Or should it be enough to bump the version of kind to current HEAD? 🤔

Because just using the current kind snapshot is not helping:

[kuttl]$ vi go.mod 
[kuttl]$ go mod tidy
go: downloading sigs.k8s.io/controller-runtime v0.19.3
go: downloading github.com/docker/docker v27.4.0+incompatible
go: downloading sigs.k8s.io/kind v0.26.0-alpha.0.20241213223025-771fb17acbc3
go: downloading golang.org/x/exp v0.0.0-20230515195305-f3d0a9c9a5cc
[kuttl]$ git diff go.mod
diff --git a/go.mod b/go.mod
index 043cc93..fc6f434 100644
--- a/go.mod
+++ b/go.mod
@@ -21,7 +21,7 @@ require (
        k8s.io/code-generator v0.31.3
        sigs.k8s.io/controller-runtime v0.19.3
        sigs.k8s.io/controller-tools v0.16.5
-       sigs.k8s.io/kind v0.25.0
+       sigs.k8s.io/kind v0.26.0-alpha.0.20241213223025-771fb17acbc3
 )
 
 require (
[kuttl]$ make envtest
mkdir -p /home/mowsiany/tmp/20241216-kind-new-containerd-Pgp/kuttl/bin
test -s /home/mowsiany/tmp/20241216-kind-new-containerd-Pgp/kuttl/bin/setup-envtest || GOBIN=/home/mowsiany/tmp/20241216-kind-new-containerd-Pgp/kuttl/bin go install sigs.k8s.io/controller-runtime/tools/setup-envtest@latest
go: downloading sigs.k8s.io/controller-runtime/tools/setup-envtest v0.0.0-20241206182001-aea2e32a9365
go: sigs.k8s.io/controller-runtime/tools/[email protected] requires go >= 1.23.0; switching to go1.23.4
go: downloading go1.23.4 (linux/amd64)
[kuttl]$ KUBEBUILDER_ASSETS=$(./bin/setup-envtest use 1.25.0 --bin-dir `pwd`/bin -p path) go test -tags integration ./pkg/test -v -mod=readonly -test.run TestAddContainers
=== RUN   TestAddContainers
    kind_integration_test.go:66: {"status":"Pulling from library/busybox","id":"latest"}
    kind_integration_test.go:66: {"status":"Digest: sha256:2919d0172f7524b2d8df9e50066a682669e6d170ac0f6a49676d54358fe970b5"}
    kind_integration_test.go:66: {"status":"Status: Image is up to date for busybox:latest"}
    kind.go:69: Adding Containers to KIND...
    kind.go:78: Add image docker.io/library/busybox:latest to node test-control-plane
ctr: content digest sha256:4c6b3915ceab750f69555510444e80541e4c72e23130c748c6ce3315f603015e: not found
    kind_integration_test.go:74: failed to add container to KIND cluster: failed to load image: command "docker exec --privileged -i test-control-plane ctr --namespace=k8s.io images import --all-platforms --digests --snapshotter=overlayfs -" failed with error: exit status 1
    kind_integration_test.go:89: failed to find image docker.io/library/busybox:latest on node test-control-plane
--- FAIL: TestAddContainers (13.65s)
FAIL
FAIL	github.com/kudobuilder/kuttl/pkg/test	17.084s
FAIL

@BenTheElder
Copy link
Member

@BenTheElder do I need to upgrade anything on my workstation to test whether the issue is fixed in my case? Or should it be enough to bump the version of kind to current HEAD? 🤔

It's the node image at HEAD, so if you use the default image then bumping to HEAD would test it, but if you're setting the node image to use then you'd have to change that.

Thanks for testing.

@BenTheElder
Copy link
Member

xref #3828 (comment)

@BenTheElder
Copy link
Member

FWIW we will be moving to the transfer API hopefully (since we prefer to use defaults and plan to upgrade to containerd 2.0), pending a fix to ctr import --all-platforms. It seems this may resolve the problem.

We should probably still report a bug with details to containerd. #3795 (comment)

I haven't had time to locally reproduce and I'd appreciate if one of you would file with containerd, thanks!

@porridge
Copy link

I'd appreciate if one of you would file with containerd, thanks!

FTR Unfortunately I don't really understand the pieces involved so I'm not able to formulate a report in containerd terms. And I'm a bit swamped with other work so no time to learn this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants