Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop of messages with History "keep all" but not with "keep last" #385

Open
theksg opened this issue Dec 19, 2024 · 7 comments
Open

Drop of messages with History "keep all" but not with "keep last" #385

theksg opened this issue Dec 19, 2024 · 7 comments

Comments

@theksg
Copy link

theksg commented Dec 19, 2024

I am creating a publisher and subscriber with QoS History "Keep all" and Durability "transient local".

If publisher publishes more than 5000 and messages, then I run the subscriber, to start listen, it will only capture last 5k messages and not all messages.

Now if I change the History to "keep last" and depth 50000, and if I do the same thing (starting subscriber after publishing 5000 messages), subscriber listen all the messages.

I want to know what the maximum queue depth allowed in fast rtps which can be adhered.

Publisher code:
#include
#include
#include
#include

#include "rclcpp/rclcpp.hpp"
#include "std_msgs/msg/int32.hpp"

using namespace std::chrono_literals;

class MinimalPublisher : public rclcpp::Node
{
public:
MinimalPublisher()
: Node("minimal_publisher"), count_(0)
{
// Set up QoS with history keep all, reliability reliable, and durability transient local
auto qos = rclcpp::QoS(rclcpp::KeepLast(50000))
.reliable()
.transient_local();

    publisher_ = this->create_publisher<std_msgs::msg::Int32>("topic", qos);
    timer_ = this->create_wall_timer(
        200ms, std::bind(&MinimalPublisher::timer_callback, this)); // Changed to 200ms
}

private:
void timer_callback()
{
if (count_ >= 7000) {
rclcpp::shutdown(); // Stop publishing after 7000 messages
return;
}

    auto message = std_msgs::msg::Int32();
    message.data = count_;
    RCLCPP_INFO(this->get_logger(), "Publishing: '%d'", message.data);
    publisher_->publish(message);
    count_++;
}

rclcpp::TimerBase::SharedPtr timer_;
rclcpp::Publisher<std_msgs::msg::Int32>::SharedPtr publisher_;
size_t count_;

};

int main(int argc, char * argv[])
{
rclcpp::init(argc, argv);
rclcpp::spin(std::make_shared());
rclcpp::shutdown();
return 0;
}

Subscriber code:
#include

#include "rclcpp/rclcpp.hpp"
#include "std_msgs/msg/int32.hpp"
using std::placeholders::_1;

class MinimalSubscriber : public rclcpp::Node
{
public:
MinimalSubscriber()
: Node("minimal_subscriber")
{
// Set up QoS with history keep all, reliability reliable, and durability transient local
auto qos = rclcpp::QoS(rclcpp::KeepLast(50000))
.reliable()
.transient_local();

    subscription_ = this->create_subscription<std_msgs::msg::Int32>(
        "/topic", qos, std::bind(&MinimalSubscriber::topic_callback, this, _1));
}

private:
void topic_callback(const std_msgs::msg::Int32::SharedPtr msg) const
{
RCLCPP_INFO(this->get_logger(), "I heard: '%d'", msg->data);
}

rclcpp::Subscription<std_msgs::msg::Int32>::SharedPtr subscription_;

};

int main(int argc, char * argv[])
{
rclcpp::init(argc, argv);
rclcpp::spin(std::make_shared());
rclcpp::shutdown();
return 0;
}

please change the QoS values in code as needed.

@fujitatomoya
Copy link
Collaborator

@theksg

as far as I know, this is designed behavior.

there is ResourceLimitsQosPolicy(https://fast-dds.docs.eprosima.com/en/v3.0.0/fastdds/api_reference/dds_pim/core/policy/resourcelimitsqospolicy.html) defined in DDS specification that determines the limits.

int32_t max_samples
Value less or equal to 0 means infinite resources. By default, 5000.

if you want to increase the limits,

  • specify the limits with KEEP_LAST, that you already confirmed.
  • apply xml configuration policy to change max_samples.

hopefully this helps.

@theksg
Copy link
Author

theksg commented Dec 20, 2024

@theksg

as far as I know, this is designed behavior.

there is ResourceLimitsQosPolicy(https://fast-dds.docs.eprosima.com/en/v3.0.0/fastdds/api_reference/dds_pim/core/policy/resourcelimitsqospolicy.html) defined in DDS specification that determines the limits.

int32_t max_samples
Value less or equal to 0 means infinite resources. By default, 5000.

if you want to increase the limits,

  • specify the limits with KEEP_LAST, that you already confirmed.
  • apply xml configuration policy to change max_samples.

hopefully this helps.

@fujitatomoya thanks for reverting.
if the resource exhaustion happens, that will lead to crashing of node. Is that the expected behavior?

@fujitatomoya
Copy link
Collaborator

if the resource exhaustion happens, that will lead to crashing of node. Is that the expected behavior?

i am not quite following this, can you rephrase the question a bit?

@theksg
Copy link
Author

theksg commented Dec 20, 2024

i am not quite following this, can you rephrase the question a bit?

With a publisher using QoS depth 50,000 and transient local durability, sending 10 MB images can lead to a large queue. If memory is overwhelmed, the node might crash. Is this expected, and should developers prevent such crashes? or there is some mechanism to prevent such failure?

@fujitatomoya
Copy link
Collaborator

i guess that should be detected at the configuration if you are using POD data before runtime, because it tries to configure the buffer with the size which is much bigger than system originally has? if that is the variable length data, DataWriter::write() call may fail and throw exception? honestly i never tried that, and maybe dependent on rmw implementation.

CC: @eboasson @MiguelCompany @trubio-rti

@clalancette
Copy link
Contributor

With a publisher using QoS depth 50,000 and transient local durability, sending 10 MB images can lead to a large queue. If memory is overwhelmed, the node might crash. Is this expected, and should developers prevent such crashes? or there is some mechanism to prevent such failure?

So while we should do our best to not crash, I'm also curious what the use case is here. What are you trying to achieve with such a large depth of large images?

@theksg
Copy link
Author

theksg commented Dec 21, 2024

So while we should do our best to not crash, I'm also curious what the use case is here. What are you trying to achieve with such a large depth of large images?

The scenario I am talkin about is hypothetical and basically only intended to identify a case where a crash will happen.
From the use case's perspective, I am writing a publisher with transient local durability and KeepAll history. I wondered if KeepAll can make node crash, so I was considering the limitations it may have which I need to consider.

I understand that default max QoS Depth for KeepAll QoS History is 5000 for fast dds and 10000 for cyclone dds. (Although hypothetical message size can even exhaust them, XD).

Then I tried KeepLast and was amazed to see their capacity can be more than what KeepAll can store, because from the definitions, KeepAll should have more capacity.

I wanted to know if a crash due to overwhelming of message queue can be identified before or not, or we relying on developer to create the message queue so as to avoid crashing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants