Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROS 2 action clients don't receive cancellation response #310

Closed
robin-mueller opened this issue Dec 19, 2024 · 15 comments · Fixed by #311
Closed

ROS 2 action clients don't receive cancellation response #310

robin-mueller opened this issue Dec 19, 2024 · 15 comments · Fixed by #311

Comments

@robin-mueller
Copy link
Contributor

robin-mueller commented Dec 19, 2024

The WorldROSWrapper exposes actions and services to the ROS 2 network and makes it possible to request actions or query services. In the case of ROS 2 actions I've found, that the individual rclpy.action.ActionServer instances might not be implemented correctly, since action clients are not able to receive a response for a cancellation request.

How to reproduce

  1. Start the GUI and the default word running pyrobosim_ros/examples/demo.py

  2. Send a simple Navigate action request. This is can be done using the terminal

    ros2 action send_goal /execute_action pyrobosim_msgs/action/ExecuteTaskAction "{action: {type: navigate, robot: robot, target_location: bathroom}}"
  3. Send a cancellation request by interrupting the terminal process pressing Ctrl+C

Expected output

Usually, the program would print something like this if the cancellation request was sent and the corresponding response received correctly:

Waiting for an action server to become available...
Sending goal:
     action:
  robot: robot
  type: navigate
  object: ''
  room: ''
  source_location: ''
  target_location: bathroom
  cost: 0.0
  has_pose: false
  pose:
    position:
      x: 0.0
      y: 0.0
      z: 0.0
    orientation:
      x: 0.0
      y: 0.0
      z: 0.0
      w: 1.0
  path:
    poses: []
    length: 0.0

Goal accepted with ID: 32789fe07b644411b0611d1f0b7e6f02

Canceling goal...
Goal canceled.  # <-- This should appear if the cancellation request was accepted and the client received the response.

Actual output

Instead, I receive an error message immediately after I requested to cancel with Ctrl+C.

Waiting for an action server to become available...
Sending goal:
     action:
  robot: robot
  type: navigate
  object: ''
  room: ''
  source_location: ''
  target_location: bathroom
  cost: 0.0
  has_pose: false
  pose:
    position:
      x: 0.0
      y: 0.0
      z: 0.0
    orientation:
      x: 0.0
      y: 0.0
      z: 0.0
      w: 1.0
  path:
    poses: []
    length: 0.0

Goal accepted with ID: 32789fe07b644411b0611d1f0b7e6f02

Canceling goal...
Exception while canceling goal: None  # <-- Instead, one observes this

Refer to ros2cli/ros2action for more information about the methods that are called during a ros2 action send_goal command.

@robin-mueller
Copy link
Contributor Author

After some investigation I suspect the following:

Looking at the source code and the action_server instance that is responsible for handling Navigate action requests in particular, there is a small caveat with the corresponding callback functions. My thoughts are based on the assumption that with rclpy the action server callbacks are executed concurrently. Therefore, when a cancel request arrives, action_callback and action_cancel_callback are both called in the same iteration (NOT simultaneously but concurrently!). However, the execution order is unkown (it's depending on the operating systems scheduling logic). So there is kind of a race condition between robot.execute_action and robot.cancel_actions. Both callbacks observe the self.executing_nav flag for determining when the execution has finished (See here and here).

In the end, it's impossible to know which callback observes that the flag is set to False first and purely determined by chance. And that is the crucial point: If robot.execute_action becomes aware of the execution completion before robot.cancel_actions, the server's execute callback action_callback will proceed and "succeed" the goal handle before the cancellation request has even been answered. When action_cancel_callback finally becomes aware of the current state, the goal handle has already reached a terminal state and an error occurs.

In my opinion, the action servers of WorldROSWrapper should be implemented according to the design that we apply with rclcpp (the C++ API). Meaning that the actual cancellation process should also be handled by the action's execute callback. Afterall, the intended responsibility of the cancel callback is only determining if the goal is allowed to be canceled or not. It's not supposed to do any real work.

@sea-bass
Copy link
Owner

Thanks for raising this!

It might be that it seems the action callbacks always do a goal_handle.succeed() instead of a goal_handle.canceled() in the right conditions.

... so I think it's just checking whether the robot's action completed or was canceled.

I'll look into this when I have the chance.

@sea-bass
Copy link
Owner

sea-bass commented Dec 19, 2024

Afterall, the intended responsibility of the cancel callback is only determining if the goal is allowed to be canceled or not. It's not supposed to do any real work.

Correct. And it does this currently. The cancel callbacks should tell the robot to cancel its running action, but the return status of that action is checked in the actual execution.

So the issue seems to be that we accept cancellation, but then don't confirm that the goal was actually canceled by setting the right status in the result.

@robin-mueller
Copy link
Contributor Author

robin-mueller commented Dec 19, 2024

It might be that it seems the action callbacks always do a goal_handle.succeed() instead of a goal_handle.canceled() in the right conditions.

That's what I thought as well, but according to the action design document it's totally fine to succeed when canceling. However, it would be nice if this was accounted for ^^

Correct. And it does this currently. The cancel callbacks should tell the robot to cancel its running action, but the return status of that action is checked in the actual execution.

Hmm, but what about this line inside robot.cancel_actions. It seems to me that the method waits until self.executing_nav is False.

I'm not used to rclpy since I mainly work with rclcpp, so please forgive me if I'm wrong :)

@sea-bass
Copy link
Owner

Yes, what happens is that this library's core is not based on ROS, so all actions return an ExecutionResult with them.

So in the case that an action is canceled, the navigation action still completes, but returns an ExecutionStatus.Canceled:

if self.cancel_execution:
self.cancel_execution = False
message = "Trajectory execution canceled by user."
self.robot.logger.info(message)
status = ExecutionStatus.CANCELED
break

So we need to check the return status of the execution result here:

execution_result = robot.execute_action(robot_action)
-- and appropriately check that to know whether we should call goal_handle.succeed() vs. goal_handle.canceled().

@robin-mueller
Copy link
Contributor Author

robin-mueller commented Dec 19, 2024

Yes, that is correct. But I still don't understand, how robot.cancel_actions

def cancel_actions(self):
"""Cancels any currently running actions for the robot."""
if not (self.executing_action or self.executing_plan or self.executing_nav):
self.logger.warning("There is no running action or plan to cancel.")
return
if self.executing_nav and self.path_executor is not None:
self.logger.info("Canceling path execution...")
self.path_executor.cancel_execution = True
while self.executing_nav:
time.sleep(0.1)
if self.executing_action or self.executing_plan:
self.canceling_execution = True
while self.canceling_execution:
time.sleep(0.1)

works. It seems to me that this function blocks until the cancellation is complete i.e. the execution result is available.

And blocking inside the cancel callback = bad like we agreed before, so what am I missing here.

@sea-bass
Copy link
Owner

Oh yes, you are right about that part.

Hmm... I guess we could always call that function in a thread and immediately return the cancel callback, but I wonder if there's a better resolution.

@robin-mueller
Copy link
Contributor Author

robin-mueller commented Dec 19, 2024

Well I'd suggest using the execute callback for that and introducing a while loop that does the actual work. That's how I always do it with rclcpp (Not a while loop but a ROS 2 timer).

while robot.executing_action:
  if cancel:
    cancel()
  else:
    execute()

But that would require refactoring action_callback to not use robot.execute_action since this is also a blocking function. Or we merge the cancel_actions function into execute_action and let cancel_action only set some flag.

@sea-bass
Copy link
Owner

sea-bass commented Dec 19, 2024

So I just tried this: https://github.com/sea-bass/pyrobosim/pull/311/files

Even though I put some log statements, and I can confirm that the if goal_handle.is_cancel_requested state is hit... somehow I still get that exception in your CLI example.

It seems to follow https://github.com/ros2/examples/blob/rolling/rclpy/actions/minimal_action_server/examples_rclpy_minimal_action_server/server_single_goal.py

Any ideas?

@sea-bass
Copy link
Owner

sea-bass commented Dec 19, 2024

Oh, another fun update. If I run that ROS 2 example, I also get the same "Exception when canceling goal: None"

ros2 run examples_rclpy_minimal_action_server server_single_goal 

ros2 action send_goal /fibonacci example_interfaces/action/Fibonacci "{order: 100}"

Ctrl + C

... so maybe this is an rclpy thing? Raised an issue in ros2/rclpy#1390

@robin-mueller
Copy link
Contributor Author

Oof.. it also seems to me that it should work. Just note, that there should probably also be some more logic that also determines when a cancel request should be rejected similar to this

"""Cancels any currently running actions for the robot."""
if not (self.executing_action or self.executing_plan or self.executing_nav):
self.logger.warning("There is no running action or plan to cancel.")
return

@sea-bass
Copy link
Owner

Good idea -- 55293eb

@robin-mueller
Copy link
Contributor Author

robin-mueller commented Dec 19, 2024

Small update: I just cloned your pull request #311 and my application (the real one, not the CLI) now works correctly. My client is now able to successfully receive the response to the cancel request and conclude the cancellation on its side.

The actual problem was in fact that cancel_action_callback was blocked by robot.cancel_actions. So changing this

robot = self.world.get_robot_by_name(goal_handle.request.action.robot)
if robot is not None:
self.get_logger().info(f"Canceling action for robot {robot.name}.")
robot.cancel_actions()
return CancelResponse.ACCEPT

to this
robot = self.world.get_robot_by_name(goal_handle.request.action.robot)
if robot is not None:
self.get_logger().info(f"Canceling action for robot {robot.name}.")
Thread(target=robot.cancel_actions).start()
return CancelResponse.ACCEPT

did the trick!

However, the other change is also needed to correctly respond to the cancellation request on the client side!

# Package up the result
if goal_handle.is_cancel_requested:
goal_handle.canceled()
else:
goal_handle.succeed()

Two birds with one stone!

Thanks for your time! I guess there is still an issue with rclpy or ros2cli but let's see...

PS: Great work with pyrobosim! It's awesome to have a lightweight platform for behavior prototyping and demonstration. I'm currently integrating this project in one of my repo's to allow users to quickly validate their programs and I'm really happy to have come across pyrobosim. So it would be awesome if you could merge this soon :)

@sea-bass
Copy link
Owner

Just merged. Thanks again for bringing this up.

I also look forward to seeing what you end up doing on your repo. If you have any usage examples to share, you can add them to https://github.com/sea-bass/pyrobosim/blob/main/docs/source/index.rst#usage-examples

And generally, let me know as anything else comes up.

@robin-mueller
Copy link
Contributor Author

I'll keep you updated. Will probably release a first version next month

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants