Skip to content

RaftSyncCounter.set() swallows transient leader unavailability as RuntimeException via CompletableFutures.join() #353

@mzazaipsc

Description

@mzazaipsc

Hi,

In JGroups-RAFT, many of the synchronous methods internally call their async counterparts in a blocking way. This is fine in general — for example, if the leader or quorum isn’t available, a TimeoutException can be thrown, and it’s appropriate for the application to handle that.

Problem

One place where this becomes problematic is in RaftSyncCounter.set(). This method internally calls asyncSet(), but blocks using CompletableFutures.join(), which wraps any underlying exceptions into a RuntimeException without exposing the root cause directly.

This differs from other methods in JGroups-RAFT that are allowed to declare throws Exception and thus let callers handle expected distributed failures (like leader unavailability).

The issue is that RaftSyncCounter implements the SyncCounter interface, and SyncCounter.set() does not declare any checked exceptions. As a result:

  • RaftSyncCounter.set() cannot throw a TimeoutException or similar checked exceptions

  • All failures get wrapped in a RuntimeException, making transient failures look catastrophic

  • Applications can’t easily distinguish between a temporary leader election and a genuine bug

This is especially concerning because leader elections or temporary unavailability are normal in RAFT, and shouldn’t cause the app to crash or misinterpret the failure.

I’m not sure if the use of join() here was intentional under the assumption that updating counter should always succeed while the cluster is up (in JGroups context) — but in the RAFT context, temporary unavailability is expected and shouldn’t be treated as fatal.

Solution

Possible solutions could include:

  • Stop implementing SyncCounter in RaftSyncCounter, allowing it to declare "throws Exception" like other methods in JGroups-RAFT.
  • Or modifying SyncCounter to declare checked exceptions on set() (but this is outside the scope of JGroups-RAFT).

Thanks for considering this issue. I’d be interested in your thoughts on whether this behavior was intentional, and what direction you’d prefer for a fix. I’m happy to implement the change if a solution is agreed upon.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions