Skip to content

Conversation

mhamza15
Copy link
Contributor

@mhamza15 mhamza15 commented Aug 17, 2025

Description

This PR builds on #11959 to create a "sticky" session VTGate balancer. It uses a consistent hash to route connections to the same tablet consistently for its duration, with a preference to local cells. It implements the TabletBalancer interface (with a few changes).

The balancer works by maintaining two hash rings for each target, one for local tablets and one for external tablets. It subscribes to health check events to keep the hash rings updated as tablets go in and out of serving, rather than constructing the hash on-demand.

A new PickOpts is added to the Pick method in the TabletBalancer interface. It currently only contains the current session hash (rather than the UUID so we hash only once on session creation and not on each call to Pick). I thought that as more balancer implementations get added, they might have their own custom parameters, so having an isolated struct so that the signature doesn't get overly long might be desirable.

On call to Pick, the balancer will look up the session hash in PickOpts and use it to find the tablet to route to, first looking at the local tablets. If there are no local tablets, it will look for an external tablet. For a given session UUID/hash, the balancer will route it to the same tablet.

Related Issue(s)

#11971

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

Copy link
Contributor

vitess-bot bot commented Aug 17, 2025

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Aug 17, 2025
@mhamza15 mhamza15 changed the title Add consistent hash vtgate balancer [WIP] Add consistent hash vtgate balancer Aug 17, 2025
@github-actions github-actions bot added this to the v23.0.0 milestone Aug 17, 2025
Copy link
Contributor Author

@mhamza15 mhamza15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some open questions:

Comment on lines 113 to 149
// watchHealthCheck watches the health check channel for tablet health changes, and updates hash rings accordingly.
func (b *SessionBalancer) watchHealthCheck(ctx context.Context, hcChan chan *discovery.TabletHealth) {
for {
select {
case <-ctx.Done():
b.hc.Unsubscribe(hcChan)
return
case tablet := <-hcChan:
if tablet == nil {
return
}

b.onTabletHealthChange(tablet)
}
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I saw two patterns for this type of logic, one using keyspace events and one using this health check. Which one would be recommended in this case?
  2. What is the behavior on a fresh start? How can we "seed" the initial state of the tablets?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think after subscribing to the health check, you should be able to call GetHealthyTabletStats (before reading from the health check stream returned by Subscribe) and build the initial hashring.

After that, you should be able to consume events from the health check channel.

Comment on lines 74 to 76
// NOTE: this currently won't consider any invalid tablets. This means we'll keep returning the same
// invalid tablet on subsequent tries. We can improve this by maybe returning a random tablet (local
// cell preferred) when the session hash falls on an invalid tablet.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the note says, we don't currently have knowledge of what tablets have already been tried so far. We can update the tablet gateway to either pass a list of invalid tablets, or have it double check that the balancer returned a valid tablet. If not, then it'll use a random tablet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we pass in a list of invalid tablets, we can instead look for the first tablet (virtual node) that has a hash >= the session hash and isn't invalid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think passing in a list of tablets to ignore / skip would be the way to go here.

Copy link

codecov bot commented Aug 17, 2025

Codecov Report

❌ Patch coverage is 86.40227% with 48 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.54%. Comparing base (8d90d26) to head (37b7e62).
⚠️ Report is 22 commits behind head on main.

Files with missing lines Patch % Lines
go/vt/vttablet/queryservice/wrapped.go 73.33% 16 Missing ⚠️
go/vt/vtgate/balancer/session.go 87.28% 15 Missing ⚠️
go/vt/vtgate/tabletgateway.go 77.96% 13 Missing ⚠️
go/vt/vtgate/vtgate.go 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #18552      +/-   ##
==========================================
+ Coverage   67.49%   67.54%   +0.05%     
==========================================
  Files        1607     1609       +2     
  Lines      263104   263608     +504     
==========================================
+ Hits       177569   178042     +473     
- Misses      85535    85566      +31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

sizeCache protoimpl.SizeCache
// SessionHash is the xxhash of the Session UUID. Used to route sessions to the same
// tablet.
SessionHash uint64 `protobuf:"varint,29,opt,name=SessionHash,proto3" json:"SessionHash,omitempty"`
Copy link
Contributor

@arthurschreiber arthurschreiber Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this break the vtgate grpc APIs? What happens if someone does a call with a Session but without the SessionHash set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't adding new protobuf fields backwards compatible?

Adding new fields is safe.

If you add new fields, any messages serialized by code using your “old” message format can still be parsed by your new generated code. You should keep in mind the default values for these elements so that new code can properly interact with messages generated by old code. Similarly, messages created by your new code can be parsed by your old code: old binaries simply ignore the new field when parsing. See the Unknown Fields section for details.

From https://protobuf.dev/programming-guides/proto3/#wire-safe-changes

Old code should ignore it, and new code handles the missing/default value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I just realized that setting it as uint64 means that we can't differentiate between a hash of 0 or the default value, so will need to update that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to *uint64 in fd5c146

Comment on lines +150 to +183
func getOrCreateRing(rings map[discovery.KeyspaceShardTabletType]*hashRing, tablet *discovery.TabletHealth) *hashRing {
key := discovery.KeyFromTarget(tablet.Target)

ring, exists := rings[key]
if !exists {
ring = newHashRing()
rings[key] = ring
}

return ring
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean there will be a new hashring created for every session? Shouldn't we have one hashring per shard known per vtgate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be one hash ring per target (keyspace + shard + tablet type) and shared across all sessions (assuming the tablet gateway that creates the balancer is also shared across all sessions, which I'm pretty sure it is but I didn't actually confirm that).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. Thanks for clarifying! ❤️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make an optimization here to only consider the keyspaces set by --balancer-keyspaces, so we're not maintaining hash rings for keyspaces that aren't sent through the balancer.

@arthurschreiber
Copy link
Contributor

One thing that would be good to add is allowing clients to switch the load balancing mode through a client side flag (in addition to managing the default load balancing behavior through the vtgate cli flag).

@arthurschreiber arthurschreiber removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request labels Aug 21, 2025
Comment on lines 93 to 94
Pick(target *querypb.Target, tablets []*discovery.TabletHealth) *discovery.TabletHealth
Pick(target *querypb.Target, tablets []*discovery.TabletHealth, invalidTablets map[string]bool, opts *PickOpts) *discovery.TabletHealth

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have PickOpts contain all the options needed, we can move invalidTablets in that struct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can pass it by value as we do not modify it.

Comment on lines 35 to 36
// tabletTypesToWatch are the tablet types that will be included in the hash rings.
var tabletTypesToWatch = []topodata.TabletType{topodata.TabletType_PRIMARY, topodata.TabletType_REPLICA, topodata.TabletType_BATCH}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Do you need Primary for balancing?
  2. Still most places we refer BATCH as RDONLY, should we use that here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed primaries and left it as RDONLY in 4682bf9

Comment on lines 115 to 131
func (b *SessionBalancer) watchHealthCheck(ctx context.Context, topoServer srvtopo.Server) {
// Build initial hash rings

// Find all the targets we're watching
targets, _, err := srvtopo.FindAllTargetsAndKeyspaces(ctx, topoServer, b.localCell, discovery.KeyspacesToWatch, tabletTypesToWatch)
if err != nil {
log.Errorf("session balancer: failed to find all targets and keyspaces: %q", err)
return
}

// Add each tablet to the hash ring
for _, target := range targets {
tablets := b.hc.GetHealthyTabletStats(target)
for _, tablet := range tablets {
b.onTabletHealthChange(tablet)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this will miss adding new targets to the balancer as it is initialized only once

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for the initial hash ring setup, and then any new targets returned by the health check will have a new hash ring created: https://github.com/vitessio/vitess/pull/18552/files#diff-a70b24fd86763104c3092c7da62fa559dce14f6bee289a8c883e26d062b0aafcR172-R181

@arthurschreiber
Copy link
Contributor

You changed the order so that first the health check is set up, and then the latest view of tablets is used to setup the hash rings. Is that safe from race conditions? 🤔

@mhamza15
Copy link
Contributor Author

You changed the order so that first the health check is set up, and then the latest view of tablets is used to setup the hash rings. Is that safe from race conditions? 🤔

Yeah I was worried about the same thing. The reasoning behind why I switched it is that if we build the initial hash rings and then set up the health check after, there might be a little time in between where we lose updates. But this way there might be a case where the health check sees a tablet go out of serving, removes it from the hash ring, then the initial hash ring build overwrites it with a stale serving tablet.

I'll keep thinking about this one.

@mhamza15
Copy link
Contributor Author

You changed the order so that first the health check is set up, and then the latest view of tablets is used to setup the hash rings. Is that safe from race conditions? 🤔

Changed in 977d2fa:

  1. Set up subscription, but don't start goroutine yet. State changes will be buffered in the meantime
  2. Fetch initial state
  3. Start goroutine and catch up on any state changes that have happened

Add `PickOpts` which allow balancers to accept options specific to their
implementation. This allows the `Pick` signature not to get overly long
as implementations and their options are added.

Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
@mhamza15 mhamza15 force-pushed the mhamza/consistent-hash-balancer branch from 7b7f78e to 479cde4 Compare August 28, 2025 01:16
Update the signature of the wrapper func to accept a new `WrapOpts`
struct, which currently contains `ExecuteOptions`, which now contains
the session UUID so that it can be passed into `Pick` for the balancer.

Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
@@ -127,7 +127,7 @@ func (vte *VTExplain) newTablet(ctx context.Context, env *vtenv.Environment, opt

tablet.QueryService = queryservice.Wrap(
nil,
func(ctx context.Context, target *querypb.Target, conn queryservice.QueryService, name string, inTransaction bool, inner func(context.Context, *querypb.Target, queryservice.QueryService) (bool, error)) error {
func(ctx context.Context, target *querypb.Target, conn queryservice.QueryService, name string, opts *queryservice.WrapOpts, inner func(context.Context, *querypb.Target, queryservice.QueryService) (bool, error)) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does WrapOpts get modified down the stack? If not, then this should probably not be passed as a reference but by value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in 2620b76

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to note I also changed ExecuteOptions inside of WrapOpts to a pointer, as go vet would complain that we are copying a mutex (used inside protobuf's internal structs): 0db53cd

})
}

opts := &balancer.PickOpts{SessionUUID: opts.Options.SessionUUID, InvalidTablets: invalidTablets}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here - I think this might be better passed by value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in e62c88f

Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
@mhamza15 mhamza15 marked this pull request as ready for review August 29, 2025 14:32
@mhamza15 mhamza15 changed the title [WIP] Add consistent hash vtgate balancer Add consistent hash vtgate balancer Aug 29, 2025
@mhamza15 mhamza15 changed the title Add consistent hash vtgate balancer Add session vtgate balancer Aug 29, 2025
@mhamza15
Copy link
Contributor Author

I'm currently working on writing an e2e test.

Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsWebsiteDocsUpdate What it says
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants