Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rosbag2 play --clock option ignores delay #1858

Closed
hayato-m126 opened this issue Nov 18, 2024 · 16 comments · Fixed by #1861
Closed

Rosbag2 play --clock option ignores delay #1858

hayato-m126 opened this issue Nov 18, 2024 · 16 comments · Fixed by #1861
Labels
bug Something isn't working

Comments

@hayato-m126
Copy link

hayato-m126 commented Nov 18, 2024

Description

Even if delay option is specified, clock option ignores delay and publishes /clock.
Also, after the time of delay elapses, clock time returns to the start time of bag.

Expected Behavior

/clock is published after the delay time as well as the topic in the bag.

Actual Behavior

/clock is published ignoring delay

To Reproduce

  1. Prepare a bag file that does not contain /clock
  2. play bag with --clock and --delay
  3. echo /clock before delay time elapses

demo

# use bag in the sample dataset https://tier4.github.io/driving_log_replayer/quick_start/setup/
ros2 bag play input_bag --delay 10 --clock 200
# open another terminal
ros2 topic echo /clock
clock_ignores_delay.mp4

demo bag start Apr 5 2022 15:07:31.594430720 (1649138851.594430720)
clock 1649138851 -> 1649138861 -> 1649138851(start time)

clock_reverse_after_10sec.txt

System (please complete the following information)

  • OS: Ubuntu 22.04
  • ROS 2 Distro: humble
  • Install Method: APT
  • Version: ros-humble-rosbag2/jammy,now 0.15.12-1jammy.20240730.153849

Additional context

** Add any other context about the problem here **

@hayato-m126 hayato-m126 added the bug Something isn't working label Nov 18, 2024
@fujitatomoya
Copy link
Contributor

this problem can be reproducible with rolling branch as well.

@nicolaloi
Copy link
Contributor

Even if delay option is specified, clock option ignores delay and publishes /clock

clock_publish_timer_ is set within the PlayerImpl constructor flow, so it starts publishing immediately:

clock_publish_timer_ = owner_->create_wall_timer(
publish_period, [this]() {
publish_clock_update();
});

To solve this issue I think it should be set in the PlayerImpl::play function, after this if statement:

if (delay > rclcpp::Duration(0, 0)) {
RCLCPP_INFO_STREAM(owner_->get_logger(), "Sleep " << delay.nanoseconds() << " ns");
std::chrono::nanoseconds delay_duration(delay.nanoseconds());
std::this_thread::sleep_for(delay_duration);
}

I have tested this locally and it behaves correctly.

Also, after the time of delay elapses, clock time returns to the start time of bag.

This is a problem I have also encountered in #1836.
It is due to the clock not starting in a paused state when a delay is specified; instead, it continues to move forward, but then when the delay time has passed it jumps back to the starting time:

clock_->jump(starting_time_);

I would start the clock in a paused state and then resume it after the delay period.


@fujitatomoya I can open a PR to solve these two problems if these fixes are ok.

@fujitatomoya
Copy link
Contributor

clock_publish_timer_ is set within the PlayerImpl constructor flow, so it starts publishing immediately:
To solve this issue I think it should be set in the PlayerImpl::play function, after this if statement:

That also does the job.
I was thinking that we can create the timer there with autostart = False, and the reset the timer after the delay.

I would start the clock in a paused state and then resume it after the delay period.

sounds good to me.

I can open a PR to solve these two problems if these fixes are ok.

appreciate your effort, i am happy to review this. thanks!

@MichaelOrlov please chime in if you have other thoughts on this.

@nicolaloi
Copy link
Contributor

I was thinking that we can create the timer there with autostart = False, and the reset the timer after the delay.

Ah yes, I see the autostart feature was added last year.

@MichaelOrlov MichaelOrlov changed the title clock option ignores delay Rosbag2 play --clock option ignores delay Nov 22, 2024
@MichaelOrlov
Copy link
Contributor

@fujitatomoya @hayato-m126 As regards:

@MichaelOrlov please chime in if you have other thoughts on this.

As more I see issues like this, I more regret that we allowed merging the "delay" option for the Rosbag2 player back in the day. I honestly would prefer to get rid of it at all rather than trying to fix the consequences and overcomplicated logic that we currently have with it.

  1. First of all "--delay" option introduces non-deterministic behavior to the playback.
  2. It also overcomplicates logic inside the player which is already the most complicated component in the Rosbag2. We have to handle a lot of corner cases.
  3. If one has non-determinism and flakiness in the workflow due to the unsynchronized playback, increasing the delay before playback will not really magically solve a problem and make behavior deterministic. It can only make it less flaky, but still wrong by design.
  4. If the problem is in design, it should be solved by design rather than trying to introduce delays in various places, hoping that "god will bless you this time" and bad things will not happen again.
  5. In my opinion, any problem trying to be solved with playback delay can and shall be solved by starting playback in pause mode and then sending a service request to resume playback when other nodes are ready imposing the strict determinism in the workflow. Please provide examples if you think that this is not true. I can't imagine any on top of my head.

@fujitatomoya
Copy link
Contributor

@MichaelOrlov thanks for your comments.

As more I see issues like this, I more regret that we allowed merging the "delay" option for the Rosbag2 player back in the day.

i get what you mean, but i am not sure about the history of --delay though.

delay can and shall be solved by starting playback in pause mode and then sending a service request to resume playback when other nodes are ready imposing the strict determinism in the workflow.

agree on this. just a minor question is --start-paused can be also applied to --loop mode? it looks like --delay option is needed for --loop mode to give specific time window for each loop?

@hayato-m126
Copy link
Author

@MichaelOrlov

Thank you for your comment.

In my opinion, any problem trying to be solved with playback delay can and shall be solved by starting playback in pause mode and then sending a service request to resume playback when other nodes are ready imposing the strict determinism in the workflow. Please provide examples if you think that this is not true. I can't imagine any on top of my head.

I have no objection.

If --delay option complicates the logic and causes problems, I think this option should be eliminated.

@MichaelOrlov
Copy link
Contributor

MichaelOrlov commented Nov 24, 2024

@fujitatomoya @hayato-m126 Thank you for your comments.
As regards a question and comment about

minor question is --start-paused can be also applied to --loop mode? it looks like --delay option is needed for --loop mode to give specific time window for each loop?

Currently --start-paused is not applied with the --loop and/or --delay options.
However, I am curious if it would be useful to apply --start-paused with --loop in some real world scenario?
In my understanding, the --start-paused needs to make sure that other receiving nodes also start. We can use wait_for_matched(..) on the receiving side, and then we can send a resume service call to start playback.
But in the case of the --loop, we don't need to wait for receivers when playback starts over from the beginning the next time since they are already up and running.
Therefore, both the --delay and --start-paused are useless in loop mode. Please correct me if I am wrong, and you would envision a real-world scenario.

@hayato-m126
Copy link
Author

I have never used loop.
I have never encountered a case where I needed loop.

But in the case of the --loop, we don't need to wait for receivers when playback starts over from the beginning the next time since they are already up and running.

However, if I were to use it, it would be as quoted.

@hayato-m126
Copy link
Author

My understanding through the discussion so far is that bag play's playback status should be managed externally through the service.
I understand that if the status is managed by the player itself, it becomes complicated.

I figured that if delay complicates the logic, then maybe loop does as well.
Wouldn't loop be a problem?

@MichaelOrlov
Copy link
Contributor

The problem with the delay is that it introduces uncertainty and non-determinism in the playback workflow.
It is also a bit difficult to handle corner cases, for instance, with timers and clocks. Especially when there is a delay inside the loop, the well-designed system with watchdogs likely will assert the whole pipeline due to the delay above the safety-defined operational limit when the loop goes to the next round.
The loop itself is not a problem since everything is running the same way as the previous round and in a pretty deterministic way.

The compromise solution would be to keep delay, but move it out of the loop.

@hayato-m126
Copy link
Author

hayato-m126 commented Nov 25, 2024

delay: non-determinism
loop: deterministic

Delay at loop is unnecessary because the nodes are already up and running.
Then, I think it would be best in terms of design to eliminate the delay.

If you really want to have a delay before play, I thought it would be better to put a sleep and create a delay outside of bag play.

sleep 5; ros2 bag play bag_file --clock 200
# launch.py
bag_player = ExecuteProcess(cmd=["ros2", "bag", "play", "bag_file" "--clock", "200"])
delay_player = ExecuteProcess(cmd=["sleep", 5], on_exit=[bag_player])

@MichaelOrlov
Copy link
Contributor

In the case of a regular ExecuteProcess(cmd=..), yes, you can insert a sleep. However, in the case of node composition, there is no sleep option as far as I am aware. Please see example from the https://github.com/ros2/rosbag2?tab=readme-ov-file#using-with-composition

@hayato-m126
Copy link
Author

@MichaelOrlov
I'm not sure, I fully understand your explanation.

Are you saying that in an application where the player is put into component container for intra-process communication, there could be a case for using delay?

In other words, are you saying that doing as quoted below is a practical solution?

The compromise solution would be to keep delay, but move it out of the loop.

@MichaelOrlov
Copy link
Contributor

Yes. This is the case when a delay CLI parameter would be needed if someone decided to use delay before playback with composable player node.

@fujitatomoya
Copy link
Contributor

The compromise solution would be to keep delay, but move it out of the loop.

this makes sense to me, which is now implemented in #1861.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants