Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker move_to_ns issue (GNS3 2.2.19 - Gentoo Linux) #74

Open
gfernandop opened this issue Mar 24, 2021 · 12 comments
Open

docker move_to_ns issue (GNS3 2.2.19 - Gentoo Linux) #74

gfernandop opened this issue Mar 24, 2021 · 12 comments

Comments

@gfernandop
Copy link

Hi guys,

I am experiencing some problems while trying to bring up a docker container within GNS3. After running gns3server with '-d' I found the output:

2021-03-24 19:01:48 ERROR route.py:242 Uncaught exception detected: <class 'KeyError'>
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/gns3server/compute/base_node.py", line 631, in _ubridge_send
await self._ubridge_hypervisor.send(command)
File "/usr/lib/python3.8/site-packages/gns3server/utils/asyncio/init.py", line 163, in wrapper
return await f(oself, *args, **kwargs)
File "/usr/lib/python3.8/site-packages/gns3server/ubridge/ubridge_hypervisor.py", line 259, in send
raise UbridgeError(data[-1][4:])
gns3server.ubridge.ubridge_error.UbridgeError: could not complete netlink transaction

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/gns3server/compute/docker/docker_vm.py", line 914, in _add_ubridge_connection
await self._ubridge_send('docker move_to_ns {ifc} {ns} eth{adapter}'.format(ifc=adapter.host_ifc,
File "/usr/lib/python3.8/site-packages/gns3server/compute/base_node.py", line 633, in _ubridge_send
raise UbridgeError("Error while sending command '{}': {}: {}".format(command, e, self._ubridge_hypervisor.read_stdout()))
gns3server.ubridge.ubridge_error.UbridgeError: Error while sending command 'docker move_to_ns tap-gns3-e0 27347 eth0': could not complete netlink transaction: uBridge version 0.9.18 running with libpcap version 1.10.0 (with TPACKET_V3)
Hypervisor TCP control server started (IP 0.0.0.0 port 36283).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/gns3server/compute/docker/docker_vm.py", line 478, in start
await self._add_ubridge_connection(nio, adapter_number)
File "/usr/lib/python3.8/site-packages/gns3server/compute/docker/docker_vm.py", line 918, in _add_ubridge_connection
raise UbridgeNamespaceError(e)
gns3server.ubridge.ubridge_error.UbridgeNamespaceError: Error while sending command 'docker move_to_ns tap-gns3-e0 27347 eth0': could not complete netlink transaction: uBridge version 0.9.18 running with libpcap version 1.10.0 (with TPACKET_V3)
Hypervisor TCP control server started (IP 0.0.0.0 port 36283).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/gns3server/web/route.py", line 198, in control_schema
await func(request, response)
File "/usr/lib/python3.8/site-packages/gns3server/handlers/api/compute/docker_handler.py", line 89, in start
await container.start()
File "/usr/lib/python3.8/site-packages/gns3server/compute/docker/docker_vm.py", line 484, in start
logdata = await self._get_log()
File "/usr/lib/python3.8/site-packages/gns3server/compute/docker/docker_vm.py", line 1141, in _get_log
result = await self.manager.query("GET", "containers/{}/logs".format(self._cid), params={"stderr": 1, "stdout": 1})
File "/usr/lib/python3.8/site-packages/gns3server/compute/docker/init.py", line 114, in query
if response.headers['CONTENT-TYPE'] == 'application/json':
KeyError: 'CONTENT-TYPE'

I've executed ubridge hypervisor mode and accessed it by telnet. The docker move_to_ns command does not work in my environment. Please, could someone help to fix this?

Thanks in advance
Regards

@grossmj
Copy link
Member

grossmj commented Apr 6, 2021

Have you check you have the correct capabilities/rights?

getcap /usr/local/bin/ubridge
/usr/local/bin/ubridge = cap_net_admin,cap_net_raw+ep

If not, set using this command:

setcap cap_net_admin,cap_net_raw=ep /usr/local/bin/ubridge

@gfernandop
Copy link
Author

Hello @grossmj,

Thanks for answering.

The output seems a bit different:

gentoolinux /home/gab # getcap /usr/bin/ubridge
/usr/bin/ubridge cap_net_admin,cap_net_raw=ep

The differences seem related only to output format, is it ok?

@grossmj
Copy link
Member

grossmj commented Apr 6, 2021

The differences seem related only to output format, is it ok?

It looks fine.

The error gns3server.ubridge.ubridge_error.UbridgeError: could not complete netlink transaction indicates that uBridge cannot use netlink which is strange.

I am not a Gentoo expert but maybe the Kernel wasn't compiled with Netlink support or something similar? https://packages.gentoo.org/useflags/netlink

@gfernandop
Copy link
Author

gfernandop commented Apr 6, 2021

The link you provided is related to the USE flags.
USE flags work like a toggle about some package feature or support.
Gentoo users can choose the package feature will be enabled or disabled based on those flags while compiling.
Within Gentoo the ubridge package has only "filecaps" as USE flag and I keep it enabled: https://packages.gentoo.org/packages/net-misc/ubridge

Regarding kernel support, I filtered my kernel config and found the output below:

gab@gentoolinux ~ $ cat /usr/src/linux/.config | grep -i netlink
CONFIG_NETFILTER_NETLINK=y
# CONFIG_NETFILTER_NETLINK_ACCT is not set
# CONFIG_NETFILTER_NETLINK_QUEUE is not set
CONFIG_NETFILTER_NETLINK_LOG=y
# CONFIG_NETFILTER_NETLINK_OSF is not set
CONFIG_NF_CT_NETLINK=y
# CONFIG_NETFILTER_NETLINK_GLUE_CT is not set
# CONFIG_NETLINK_DIAG is not set
CONFIG_QUOTA_NETLINK_INTERFACE=y

Please let me know I should enable any flag is not enabled at all.
For sake of completeness I am able to bring up Alpine Linux as a container within GNS3 without any concerns.
docker move_to_ns commands seems to work properly while configuring Alpine =/

@mm1ke
Copy link

mm1ke commented Apr 26, 2021

Hello @grossmj

I'm maintaining the gns3 packages at gentoo and i was looking into this problem for some time now. Actually i don't really have an idea were the problem come from.
However i was playing around with other distros as well and I could reproduce this problem with opensuse tumbleweed too. Ubuntu on the other hand doesn't suffer from it.

Checking the packages from both distros I saw that the Qt version is different. While gentoo and opensuse are already on 5.15, ubuntu still uses 5.14, which is why i was wondering if the Qt version could be the issue here?

@grossmj
Copy link
Member

grossmj commented Apr 26, 2021

Checking the packages from both distros I saw that the Qt version is different. While gentoo and opensuse are already on 5.15, ubuntu still uses 5.14, which is why i was wondering if the Qt version could be the issue here?

I doubt Qt has anything to do with it. The move_to_ns command basically moves an interface to a Linux namespace: https://github.com/GNS3/ubridge#docker-module-docker

I am still suspecting something isn't enabled or any other kind of restrictions, maybe checking this could help: https://wiki.gentoo.org/wiki/Docker#Kernel

Also, please try to manually create a network namespace and add a veth pair like this:

ip netns add test
ip netns list
ip link add veth0 type veth peer name veth1
ip link set veth1 netns test
ip netns exec test ip link list

There is a problem with netlink if you get any RTNETLINK errors.

This would help to isolate the issue. Thanks 👍

@gfernandop
Copy link
Author

Hi @grossmj,

No issues while running the commands you provided:

gentoolinux /home/gab # ip netns add test
gentoolinux /home/gab # ip netns list
test
gentoolinux /home/gab # ip link add veth0 type veth peer name veth1
gentoolinux /home/gab # ip link set veth1 netns test
gentoolinux /home/gab # ip netns exec test ip link list
1: lo: mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ip6_vti0@NONE: mtu 1364 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/tunnel6 :: brd :: permaddr da35:2ea2:4c2f::
3: sit0@NONE: mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/sit 0.0.0.0 brd 0.0.0.0
4: ip6tnl0@NONE: mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/tunnel6 :: brd :: permaddr 6a0f:abdf:9b04::
7: veth1@if8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 46:da:0f:a9:72:13 brd ff:ff:ff:ff:ff:ff link-netnsid 0

@grossmj
Copy link
Member

grossmj commented May 6, 2021

Thanks, this must mean there is nothing wrong with netlink itself.

Let's try to use uBridge to manually add an interface to a Docker container.

1 - Start a Docker container

docker run -it --rm alpine /bin/ash

2 - Find the Pid of the container

Now that the container is running, we need it's ID.

$ docker container list
CONTAINER ID   IMAGE     COMMAND      CREATED         STATUS         PORTS     NAMES
00ae1b2479b7   alpine    "/bin/ash"   7 minutes ago   Up 6 minutes             interesting_lehmann

Then we can use the container ID to find the Pid

$ docker container inspect 00ae1b2479b7 | grep Pid
            "Pid": 308036,
            "PidMode": "",
            "PidsLimit": null,

3 - Run uBridge in hypervisor mode

Start uBridge to listen on port 4242 (with the same user you would use to run the GNS3 server).

$ ubridge -H 4242
uBridge version 0.9.19 running with libpcap version 1.9.1 (with TPACKET_V3)
Hypervisor TCP control server started (port 4242).

4 - Create a TAP interface and move it to Docker container

Then use telnet to connect to port 4242 and issue commands (replace the container Pid 308036 by the one from your container):

$ telnet localhost 4242
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
bridge create test
100-bridge 'test' created
bridge add_nio_tap test tap4242
100-NIO TAP added to bridge 'test'
docker move_to_ns tap4242 308036 eth42
100-tap4242 moved to namespace 308036

Now I expect you would get the error right after you enter the docker move_to_ns command. Please try again by running uBridge with root to see if this is because of a permission issue. Thanks for your help!

@mm1ke
Copy link

mm1ke commented May 14, 2021

Hi,

i've just tried the steps on my system to see what the problem is. Unfortunately i didn't got any error back when running the command on the cli.
For you information, this problem seems to happen only with certain docker images. While, for example, alpine works without problems, the docker image ehlers/ostinato suffers from this issue.
I've tried now both docker images with the commands you provided, but none of them gave any erros out:

alpine:

ai@x1 ~ $ telnet localhost 4242
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
bridge create test
100-bridge 'test' created
bridge add_nio_tap test tap4242
100-NIO TAP added to bridge 'test'
docker move_to_ns tap4242 25229 eth42
100-tap4242 moved to namespace 25229

ehlers/ostinato:

Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
bridge create test
100-bridge 'test' created
bridge add_nio_tap test tap4242
100-NIO TAP added to bridge 'test'
docker move_to_ns tap4242 26169 eth42
100-tap4242 moved to namespace 26169

I even checked in the docker image if the interface was really there:

root@03e52a8fac78:/# ip ad
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
20: eth0@if21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever
22: eth42: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 16:cf:27:36:47:71 brd ff:ff:ff:ff:ff:ff
root@03e52a8fac78:/#

Seems to be fine. Trying again with gns3 however turns out the problem still seems to be there..

@ghost
Copy link

ghost commented Oct 29, 2021

The docker move_to_ns error message is also shown, when during the setup of the bridge the docker container dies. So your issue might have nothing to do with ubridge.

I suggest to have a look at the logs of the docker container. Use docker ps -a to find out the container id, then use docker logs to view the log.

behlers@iMac:~$ docker ps -a
CONTAINER ID   IMAGE              COMMAND                  CREATED             STATUS                         PORTS     NAMES
cb62deaff675   alpine-be:latest   "/gns3/init.sh /etc/…"   22 seconds ago      Exited (1) 16 seconds ago                gifted_carson
behlers@iMac:~$ docker logs -t cb62deaff675
2021-10-29T15:02:57.755529052Z standard_init_linux.go:219: exec user process caused: exec format error
2021-10-29T15:02:59.383432634Z standard_init_linux.go:219: exec user process caused: exec format error
behlers@iMac:~$ 

In my example the container dies early with "exec format error". Even though this error has nothing to do with an ubridge issue, I get the same log messages as you with all that ubridge stuff.

@roeme
Copy link

roeme commented Dec 22, 2022

I'm currently investigating the same behaviour on Debian 11.5 & podman (instead of Docker, but exposing the same interface through the appropriate socket).

The GNS3 server erroneously sends back the container's stdout as error message to the GNS3 client, but further investigation into the logs show that the containers do come up, but ubridge fails during docker move_to_ns. Subsequently, GNS3 then kills the containers.

The capabilities are set correctly on the ubridge binary, and I've confirmed that it makes no difference whether the ubridge hypervisor runs under a normal user or root.

Interestingly I've encountered the kernel message "A link change request failed with some changes committed already." during troubleshooting, which may help to pinpoint what exactly the cause is here, but I don't yet have a reliable repro for that.

Edit: I should highlight that in my case, I do indeed get an error immediately after move_to_ns when manually talking to a ubridge hypervisor.

@roeme
Copy link

roeme commented Dec 22, 2022

Update: Upon further investigating, it seems that GNS3 is attempting the move_to_ns with renaming the interface to eth0. But this interface already exists in the container. I'm not sure wether this is specific to podman.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants