Aggregator: publish diagnostics_toplevel_state immediately when error is received #317

outrider-jhulas · 2023-10-25T12:51:29Z

To improve the diagnostic_aggregator, additional functionality is added to publish the diagnostics_top_level_state as soon as a diagnostic message with an ERROR state is received.

This can be particularly helpful, when using the diagnostics_top_level_state as a system error indicator. For example, immediately stop the robot, as soon as any error in the system is detected.

With the current implementation, the error state will take up the the 1/pub_rate seconds to be published, which is a significant delay in many robotic systems.
This point was already brought up in !197 and #48.

To guarantee backwards compatibility, the functionality is disabled by default and can be enabled with the critical parameter set to true .

To avoid spamming the topic, the critical error state message will be published only once, and only if the system is error free.

ct2034

The changes look good. Thanks for your contribution.

Can you please additionally provide tests. See https://github.com/ros/diagnostics/pull/197/files#diff-d52d1bb554fc7e0c46c8810896296d59f4bf8902bd0eca3c8629e231989c76af for an inspiration.

ct2034

Thanks for your contribution. This looks very nice.

ct2034 · 2023-12-05T10:47:18Z

@outrider-jhulas Do you have an idea what the problem is here: https://build.ros2.org/job/Hdev__diagnostics__ubuntu_jammy_amd64/31/testReport/junit/(root)/projectroot/test_test_critical_pub_py/

ct2034 · 2023-12-05T16:31:35Z

I opened this: #323. If I can't fix it i will have to revert this

outrider-jhulas · 2023-12-06T10:13:08Z

Seems like sometimes the aggregator publishes stale msg before the test starts. This should fix it. Let me know if you want me to make a new MR

        # Ensure the aggregator doesn't publish stale msg at startup
        status = DiagnosticStatus.OK
        while True:
            self.publish_message(status)
            if status == self.received_state[0]:
                return
            else:
                self.received_state.clear()
    
    def test_critical_publisher(self):
        self.init_aggregator()

* more info on failing comparisons * ignoring old wrong messages * fixes in ntp monitor test --------- Signed-off-by: Christian Henkel <[email protected]>

Timple · 2023-12-20T09:00:01Z

Sorry to be late to the party. But why was this done only for ERROR cases?

I would expect this to be active on all regresions. So dropping from OK to WARN, from WARN to ERROR and also for OK to ERROR directly of course.

ct2034 self-assigned this Oct 25, 2023

ct2034 requested changes Nov 16, 2023

View reviewed changes

outrider-jhulas force-pushed the feat-critical-top-level-publisher branch 2 times, most recently from 2e69d78 to 9ea8c8f Compare November 24, 2023 15:18

outrider-jhulas requested a review from ct2034 November 24, 2023 15:26

outrider-jhulas force-pushed the feat-critical-top-level-publisher branch from a3c441e to b000489 Compare November 29, 2023 08:53

feat: publish top level msg when error is received

f7638ae

outrider-jhulas force-pushed the feat-critical-top-level-publisher branch from bf64c5f to f7638ae Compare November 29, 2023 09:23

ct2034 reviewed Dec 5, 2023

View reviewed changes

ct2034 approved these changes Dec 5, 2023

View reviewed changes

ct2034 merged commit 9e44dd8 into ros:ros2 Dec 5, 2023
22 checks passed

ct2034 mentioned this pull request Dec 5, 2023

Debugging instability introduced by #317 #323

Merged

ct2034 added a commit that referenced this pull request Dec 12, 2023

Debugging instability introduced by #317 (#323)

b169c12

* more info on failing comparisons * ignoring old wrong messages * fixes in ntp monitor test --------- Signed-off-by: Christian Henkel <[email protected]>

Timple mentioned this pull request Dec 21, 2023

Aggregator: publish diagnostics_toplevel_state immediately on every degradation #324

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregator: publish diagnostics_toplevel_state immediately when error is received #317

Aggregator: publish diagnostics_toplevel_state immediately when error is received #317

outrider-jhulas commented Oct 25, 2023

ct2034 left a comment

ct2034 left a comment

ct2034 commented Dec 5, 2023

ct2034 commented Dec 5, 2023

outrider-jhulas commented Dec 6, 2023

Timple commented Dec 20, 2023

Aggregator: publish diagnostics_toplevel_state immediately when error is received #317

Aggregator: publish diagnostics_toplevel_state immediately when error is received #317

Conversation

outrider-jhulas commented Oct 25, 2023

ct2034 left a comment

Choose a reason for hiding this comment

ct2034 left a comment

Choose a reason for hiding this comment

ct2034 commented Dec 5, 2023

ct2034 commented Dec 5, 2023

outrider-jhulas commented Dec 6, 2023

Timple commented Dec 20, 2023