Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Networking issues #296

Open
cfergeau opened this issue Nov 13, 2023 · 2 comments
Open

Networking issues #296

cfergeau opened this issue Nov 13, 2023 · 2 comments

Comments

@cfergeau
Copy link
Collaborator

cfergeau commented Nov 13, 2023

Sometimes, after a while, podman machine networking, or crc networking stops working.
No clear reproducer, but was hit by people working on podman-desktop, by some crc users, ...
Latest such issue is:
containers/podman#20639
The common symptom is that ssh access to the VM does not work.
modprobe -r virtio-net && modprobe virtio-net gets the network back up in #20639.

Currently working with Florent who filed #20639 and who can reproduce it several times per week to get some traces through dlv to see if this gives a hint as to what's going on. This could be a gvproxy bug as much as a kernel or qemu bug.

Regarding the other similar bugs which have been filed/mentioned in the past, they may have the same root cause, or not.
They happened on Windows + hyperv, on macos + vfkit, and I think even on linux + libvirt/qemu.
#20639 was macos + qemu. This means this both happens with gvproxy, and with crc daemon + vm process running in the VM.

There were hints of a crc daemon crash/restart in the linux + qemu case, but not in #20639, which is why I'm thinking there could be different issues.

@cfergeau
Copy link
Collaborator Author

Regarding #20639, I asked Florent

  • to upgrade gvproxy to the latest released version as the one shipped by podman 4.7.2 is old (0.5.0 vs 0.7.1)
  • extract the binary for his platform as delve does not support universal macos binaries: lipo -extract arm64 -output gvproxy-darwin-arm64 ./gvproxy-darwin
  • replace the gvproxy binary used by podman with this gvproxy-darwin-arm64 binary
  • install delve: brew install delve
  • get some traces when the issue occurs:
$ dlv attach  $(pgrep gvproxy)

(dlv) trace /github.com\/containers\/gvisor-tap-vsock\/*/

When the tracing is done, it's possible to detach dlv from the process by pressing ctrl+c and answering 'no' when delve asks if the process should be killed.

@cfergeau
Copy link
Collaborator Author

Regarding containers/podman#20639, one suggestion from @n1hility was to try to use vm/gvforwarder in the VM, and sends the network traffic over vsock rather than directly over virtio-net to see if the bug can still be reproduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant