Skip to content

Potential race condition between BaseEletion vote and log initialisation on master #364

@grunenwflorian

Description

@grunenwflorian

Hello,

I am using jgroups 1.1.0
I noticed a recurrent exception on startups (15% of the time) which (I believe) has no impact.

It is related to a race condition between the BaseElection protocol and the RAFT protocol regarding the voting mechanism and the access by the BaseElection to the log.

I think you are already aware of this problem (there is a comment about it in the header of the RAFT class) and you have a plan to fix it.

Would you be open to a simple patch to fix this in the meantime ? (synchronise the log)

Regards and Thanks !

The exception

ups.protocols.pbcast.NAKACK2 : ERROR - JGRP000039: data: failed to deliver OOB message [data to , 0 bytes, flags=OOB, transient_flags=OOB_DELIVERED]: java.lang.IllegalStateException: Log not initialized
java.lang.IllegalStateException: Log not initialized
at org.jgroups.protocols.raft.FileBasedLog.checkMetadataStarted(FileBasedLog.java:316) ~[jgroups-raft-1.1.0.Final.jar:?]
at org.jgroups.protocols.raft.FileBasedLog.votedFor(FileBasedLog.java:139) ~[jgroups-raft-1.1.0.Final.jar:?]
at org.jgroups.protocols.raft.state.RaftState.setVotedFor(RaftState.java:199) ~[jgroups-raft-1.1.0.Final.jar:?]
at org.jgroups.protocols.raft.state.RaftState.setVotedFor(RaftState.java:167) ~[jgroups-raft-1.1.0.Final.jar:?]
at org.jgroups.protocols.raft.RAFT.votedFor(RAFT.java:379) ~[jgroups-raft-1.1.0.Final.jar:?]
at org.jgroups.protocols.raft.election.BaseElection.handleVoteRequest(BaseElection.java:242) ~[jgroups-raft-1.1.0.Final.jar:?]
at org.jgroups.protocols.raft.election.BaseElection.handleMessage(BaseElection.java:170) ~[jgroups-raft-1.1.0.Final.jar:?]
at org.jgroups.protocols.raft.election.BaseElection.up(BaseElection.java:143) ~[jgroups-raft-1.1.0.Final.jar:?]
at org.jgroups.stack.Protocol.up(Protocol.java:360) ~[jgroups-5.4.5.Final.jar:5.4.5.Final]
at org.jgroups.protocols.FORK.up(FORK.java:146) ~[jgroups-5.4.5.Final.jar:5.4.5.Final]
at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:130) ~[jgroups-5.4.5.Final.jar:5.4.5.Final]
at org.jgroups.protocols.RSVP.up(RSVP.java:178) ~[jgroups-5.4.5.Final.jar:5.4.5.Final]
at org.jgroups.protocols.FRAG2.up(FRAG2.java:137) ~[jgroups-5.4.5.Final.jar:5.4.5.Final]
at org.jgroups.protocols.FlowControl.up(FlowControl.java:261) ~[jgroups-5.4.5.Final.jar:5.4.5.Final]
at org.jgroups.protocols.FlowControl.up(FlowControl.java:253) ~[jgroups-5.4.5.Final.jar:5.4.5.Final]
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:783) ~[jgroups-5.4.5.Final.jar:5.4.5.Final]
at org.jgroups.protocols.raft.NO_DUPES.up(NO_DUPES.java:45) ~[jgroups-raft-1.1.0.Final.jar:?]
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:235) ~[jgroups-5.4.5.Final.jar:5.4.5.Final]
at org.jgroups.protocols.pbcast.NAKACK2.deliver(NAKACK2.java:973) [jgroups-5.4.5.Final.jar:5.4.5.Final]
at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:857) [jgroups-5.4.5.Final.jar:5.4.5.Final]
at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:652) [jgroups-5.4.5.Final.jar:5.4.5.Final]
at org.jgroups.protocols.BARRIER.up(BARRIER.java:173) [jgroups-5.4.5.Final.jar:5.4.5.Final]
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:132) [jgroups-5.4.5.Final.jar:5.4.5.Final]
at org.jgroups.protocols.FailureDetection.up(FailureDetection.java:180) [jgroups-5.4.5.Final.jar:5.4.5.Final]
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:294) [jgroups-5.4.5.Final.jar:5.4.5.Final]
at org.jgroups.protocols.MERGE3.up(MERGE3.java:272) [jgroups-5.4.5.Final.jar:5.4.5.Final]
at org.jgroups.protocols.Discovery.up(Discovery.java:296) [jgroups-5.4.5.Final.jar:5.4.5.Final]
at org.jgroups.protocols.TP.passMessageUp(TP.java:1146) [jgroups-5.4.5.Final.jar:5.4.5.Final]
at org.jgroups.util.SubmitToThreadPool$SingleLoopbackHandler.run(SubmitToThreadPool.java:87) [jgroups-5.4.5.Final.jar:5.4.5.Final]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.base/java.lang.Thread.run(Thread.java:833) [?:?]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions