Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump Khepri to 0.17.0 #12753

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

Bump Khepri to 0.17.0 #12753

wants to merge 6 commits into from

Conversation

the-mikedavis
Copy link
Member

This is not ready to be merged yet, Khepri 0.17.0 hasn't yet been released. (And we want to make more changes before then anyways.)

This PR handles the breaking changes in Khepri 0.17.0.

@the-mikedavis the-mikedavis self-assigned this Nov 18, 2024
@the-mikedavis the-mikedavis force-pushed the md/khepri-0-17 branch 3 times, most recently from 6d289fe to cc6185d Compare November 25, 2024 16:28
@the-mikedavis the-mikedavis force-pushed the md/khepri-0-17 branch 6 times, most recently from 5cb0a79 to 42f571a Compare December 6, 2024 20:49
the-mikedavis and others added 6 commits January 21, 2025 10:06
Khepri v0.17 needs a change in Ra 2.15.0 that exposes more data in the
`ra_aux` API.
`locally_known_members/1` and `locally_known_node/1` were replaced with
`members/2` and `nodes/2` with `favor` set to `low_latency` - this
matches the interface for queries in Khepri.
All callers of `khepri_adv` and `khepri_tx_adv` need updates to handle
the now consistent return type of `khepri:node_props_map()` in Khepri
0.17.

We don't need any compatibility code to handle "either the old return
type or the new return type" because the translation is done entirely
in the "client side" code in Khepri - meaning that the return value from
the Ra server is the same but it is translated differently by the
functions in `khepri_adv` and `khepri_tx_adv`.
[Why]
When running mixed-version tests, nodes 1/3/5/... are using the primary
umbrella, so usually the newest version. Nodes 2/4/6/... are using the
secondary umbrella, thus the old version.

When clustering, we used to use node 1 (running a new version) as the
seed node, meaning other nodes would join it.

This complicates things with feature flags because we have to make sure
that we start node 1 with new stable feature flags disabled to allow old
nodes to join.

This is also a problem with Khepri machine versions because the cluster
would start with the latest version, which old nodes might not have.

[How]
This patch changes the logic to use a node running the secondary
umbrella as the seed node instead. If there is no node running it, we
pick the first node as before.

V2: Revert part of "rabbitmq_ct_helpers: Fix how we set
    `$RABBITMQ_FEATURE_FLAGS` in tests" (commit
    57ed962). These changes are no
    longer needed with the new logic.

V3: The check that verifies that the correct metadata store is used has
    a special case for nodes that use the secondary umbrella: if Khepri
    is supposed to be used but it's not, the feature flag is enabled.
    The reason is that the `v4.0.x` branch doesn't know about the `rel`
    configuration of `forced_feature_flags_on_init`. The nodes will
    have ignored thies parameter and booted with the stable feature
    flags only.

    Many testsuites are adapted to the new clustering order. If they
    manage which node joins which node, either the order is changed in
    the testcases, or nodes are started with only required feature
    flags. For testsuites that rely on peer discovery where the order is
    unknown, nodes are started with only required feature flags.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

2 participants