Worse performance compared to Architecture Proposal ROS1 version? #2636
-
I have been using ROS1 version of https://github.com/tier4/AutowareArchitectureProposal.iv in our vehicle for quite some time, and I have recently started to pick up with autoware.universe. After a few tests on field, I can't help but feel that the overall performance of autoware on my machine is worse than what it used to be with AAP ROS 1. For instance, with AAP I did not have any localization issue on our test field, but with universe the localization is quite unstable and is easily lost if I drive too fast or turn suddenly. Similarly, the whole object tracking pipeline seems to be slower than with AAP (1~2 second delay from VLP cloud to tracked objects). So far, I have only been able to reach real-time performance by heavily downsampling the VLP sensor cloud (which was not necessary with AAP). I don't doubt universe algorithms and features are better and more reliable than AAP, but I somewhat expected that the change from ROS1 to ROS2, the use of efficient DDS implementation and intraprocess communication would compensate the extra processing. Is it only my experience/feeling? Is autoware heavier (slower?) than before or it just a matter of configuration/tuning? (e.g. #204 (comment)) |
Beta Was this translation helpful? Give feedback.
Replies: 0 comments 11 replies
-
Although I'm not so familiar with this performance issue, I remember one of the biggest reasons is the overhead of existing executors. Regarding Autoware itself, I believe it didn't get so slower than TIER IV's proposal version. But as the features have been increasing gradually, it requires more machine resources if you use the full features. |
Beta Was this translation helpful? Give feedback.
-
@VRichardJP
However, we have not confronted with such disastrous performance issue as you said. Considering behavior you mentioned, I suspects that your system has something wrong with DDS configuration or memory bandwidth.
Inter-process communication via multicast on DDS costs CPU time. Each ROS 2 process has a thread to receive topic message, named "recvMC". recvMC threads cost about 20-30% of total CPU time in Autoware. You will find some useful references on CycloneDDS. <?xml version="1.0" encoding="UTF-8" ?>
<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd">
<Domain Id="any">
<General>
<Interfaces>
<NetworkInterface autodetermine="true" priority="default" multicast="default" />
</Interfaces>
<AllowMulticast>spdp</AllowMulticast>
<MaxMessageSize>65500B</MaxMessageSize>
</General>
<Discovery>
<EnableTopicDiscoveryEndpoints>true</EnableTopicDiscoveryEndpoints>
</Discovery>
<Internal>
<Watermarks>
<WhcHigh>500kB</WhcHigh>
</Watermarks>
</Internal>
<Tracing>
<Verbosity>config</Verbosity>
<OutputFile>cdds.log.${CYCLONEDDS_PID}</OutputFile>
</Tracing>
</Domain>
</CycloneDDS> |
Beta Was this translation helpful? Give feedback.
-
To illustrate my situation, I have made a small script to track message frequency (like In the following situation I use cycloneDDS with its default configuration: As I said, I need to heavily downsample the VLP cloud to get "acceptable" performance: Then, using But still, it is far from being ideal. In particular Last but not least. I observe that despite all my effort configuring cycloneDDS, FastRTPS seems always way faster: I am wondering why FastRTPS is not recommended for autoware. |
Beta Was this translation helpful? Give feedback.
-
I think I finally managed to reach a good performance! At the end of the day, it was only a matter of a few changes:
|
Beta Was this translation helpful? Give feedback.
I think I finally managed to reach a good performance!
At the end of the day, it was only a matter of a few changes: