Skip to content

[INFERENCE] Node ID and Heartbeart Bugfix#604

Open
samherring99 wants to merge 2 commits intomainfrom
node_id_bugfix
Open

[INFERENCE] Node ID and Heartbeart Bugfix#604
samherring99 wants to merge 2 commits intomainfrom
node_id_bugfix

Conversation

@samherring99
Copy link
Collaborator

@samherring99 samherring99 commented Mar 1, 2026

This PR fixes two bugs found in testing:

  1. The node ID that is logged on discovery is never cleared / reset, leading to failing requests when an inference node drops and rejoins. This PR adds a removal mechanism after 2 missed heartbeats to avoid routing requests to stale nodes.
  2. The heartbeat mechanism sends gossip messages with the same message hash, so they aren't received by the gateway. This change adds a timestamp to the gossip message for uniqueness and ensures each heartbeat is received.

Tested on hgx-2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant