-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Issue with nav2 when using the bridge #31
Comments
I am facing the exact same issue, running the nav2 controller-server with some custom (but simple) controller plugins. It seems like the lifecycle transitions are attempted twice when zenoh is running. I tested running a simple python node with some prints in the callbacks, and there I saw that its service calls weren't being executed twice. I don't know what would make the lifecycle service calls behave differently. When attempting to send the action goal to the controller server, I get the same runtime exception:
I have been looking into this for some time now, but I still have no idea what could be causing this. I am also using cyclone-dds, on an Nvida Jetson running Ubuntu 20.04.5 LTS (GNU/Linux 5.10.104-tegra aarch64) @samiamlabs did you manage to get any new insights? |
I am facing the same issue in a slightly different setting.
The Zenoh bridge gives a warning when this happens which is as follows:
What I also noticed, is that the Zenoh bridge should be started after the turtlebot bringup, otherwise the nodes will node start up correctly. But I don't know if these two issues are part of the same problem. Did you find any solution for this? |
@samiamlabs : I managed to install Navigation2 as a Dev Container and to run it inside, along I'm not sure how lifecycle nodes are implemented in ROS 2, and especially if they rely on Services or Actions. Could you please all try to increase this timeout to 10 minutes (or more) on all your bridges:
Note that this setting was added recently, in commit 6e04f60. |
@JEnoch I just tested what you suggested, changing the timeout value to different values even much higher than 600. While the timeout warning seems to be gone when I use very high values it still doesn't work and I get the following warning on the control bridge before the node dies on the turtlebot:
|
Thanks for testing. |
I just sent you the full logs per email as they are quite long, thanks |
Thanks @miltzhaw! I reviewed your logs. The On another hand, I saw that each message on I'll now do some tests with Lifecycle Nodes to check if they are correctly supported. |
There was indeed a bug with Services leading to issue with Lifecycle Nodes: see #43 @samiamlabs , @btertoolen: could you please test with this fix and tell me if this solved your issue with nav2 ? |
Hi @JEnoch, using the latest build indeed seems to have resolved the issue, so thanks! We will do some more testing, so if we notice anything else I will let you know. |
Have not had time to do extensive testing but did some more of it. I found no problems that I think are related to the bridge during the testing. This issue looks fixed as far as I can tell :) |
Thanks for your confirmations @btertoolen and @samiamlabs ! I'll do a patch release ( |
Sorry for the late reply @JEnoch but I couldn't test it until today.
|
Hi @ciandonovan, the bridge should work with any ROS 2 version (at least from Foxy). Could you please create a new issue describing what you're experiencing, adding logs (debug level) of bridges, and ideally with a simple way to reproduce. |
@JEnoch this is specifically regarding the Nav2 bringup issue, as described by OP @samiamlabs. Was able to reproduce the original issue with an almost identical setup, will send some reproduceable containers later. |
When running navigation2 with Zenoh already running, it hangs on: When running navigation2 without Zenoh, it works fine, all nodes are brought up by the lifecycle manager. Links to the sources used to build the containers, with the upstream code repos as submodules, listed below. Zenoh is built from https://github.com/ciandonovan/ros2_navigation |
I actually observed something similar as @ciandonovan is mentioning. |
@miltzhaw that's exactly what I'm seeing too, our workaround is as you said, start the bridge after bringup. Doing it manually at the moment, could automate it with systemd service dependencies and systemd notify which would reduce race conditions somewhat, but hopefully the underlying issue will be resolved. @JEnoch could you re-open this issue? |
@ciandonovan Thanks for the podman containers. I managed to install and run those. I see the same log as you. But I don't know what is stuck in nav2 launch. |
This is what it's supposed to look like, and what it looks like when the Zenoh bridge is not running.
|
I think I found the issue: #62 |
@ciandonovan the fix has to be reviewed before merging, but you can already test it in the |
I just tested the I can now have the bridge running while launching lifecycle nodes from Nav2 and everything seems to work, nodes come online and the robot navigates correctly. I will let you know if I come across any other issues, but it does look resolved to me, appreciate your work thank you. |
Thanks for your confirmation. The fix is now merged into |
Version 0.10.1-rc.2 has been released! |
Describe the bug
When the zenoh bridge is running and I launch navigation2, the lifecycle transitions seem to be acting up. Also, sending navgoals fail.
An example output for the lifecycle thing is:
As far as I can tell, the nodes are activated despite the error.
However, when the bridge is running in one of my gazebo simulations, bt_navigator dies after I send a navgoal. Sending navgoals works fine when I don't run the bridge.
I tried starting the bridge after I had already started the simulation with nav2 and made sure that sending navgoals worked.
Sending another navgoal after I started the bridge resulted in:
To reproduce
To reproduce the lifecycle errors:
zenoh-bridge-ros2dds
from https://github.com/eclipse-zenoh/zenoh-plugin-ros2dds/actions./zenoh-bridge-ros2dds -l tcp/0.0.0.0:7447
in the Dev ContainerI have only seen the crashing nodes when I send a navgoal in my own simulation, but I suspect the issue is general.
If you don't have a simulated robot environment to test this in, I can set up a Dev Container with turtlebots or something.
System info
The text was updated successfully, but these errors were encountered: