-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Add session vtgate balancer #18552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add session vtgate balancer #18552
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some open questions:
go/vt/vtgate/balancer/session.go
Outdated
// watchHealthCheck watches the health check channel for tablet health changes, and updates hash rings accordingly. | ||
func (b *SessionBalancer) watchHealthCheck(ctx context.Context, hcChan chan *discovery.TabletHealth) { | ||
for { | ||
select { | ||
case <-ctx.Done(): | ||
b.hc.Unsubscribe(hcChan) | ||
return | ||
case tablet := <-hcChan: | ||
if tablet == nil { | ||
return | ||
} | ||
|
||
b.onTabletHealthChange(tablet) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I saw two patterns for this type of logic, one using keyspace events and one using this health check. Which one would be recommended in this case?
- What is the behavior on a fresh start? How can we "seed" the initial state of the tablets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think after subscribing to the health check, you should be able to call GetHealthyTabletStats
(before reading from the health check stream returned by Subscribe
) and build the initial hashring.
After that, you should be able to consume events from the health check channel.
go/vt/vtgate/balancer/session.go
Outdated
// NOTE: this currently won't consider any invalid tablets. This means we'll keep returning the same | ||
// invalid tablet on subsequent tries. We can improve this by maybe returning a random tablet (local | ||
// cell preferred) when the session hash falls on an invalid tablet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the note says, we don't currently have knowledge of what tablets have already been tried so far. We can update the tablet gateway to either pass a list of invalid tablets, or have it double check that the balancer returned a valid tablet. If not, then it'll use a random tablet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we pass in a list of invalid tablets, we can instead look for the first tablet (virtual node) that has a hash >= the session hash and isn't invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think passing in a list of tablets to ignore / skip would be the way to go here.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18552 +/- ##
==========================================
+ Coverage 67.49% 67.54% +0.05%
==========================================
Files 1607 1609 +2
Lines 263104 263608 +504
==========================================
+ Hits 177569 178042 +473
- Misses 85535 85566 +31 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
go/vt/proto/vtgate/vtgate.pb.go
Outdated
sizeCache protoimpl.SizeCache | ||
// SessionHash is the xxhash of the Session UUID. Used to route sessions to the same | ||
// tablet. | ||
SessionHash uint64 `protobuf:"varint,29,opt,name=SessionHash,proto3" json:"SessionHash,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this break the vtgate grpc APIs? What happens if someone does a call with a Session
but without the SessionHash
set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't adding new protobuf fields backwards compatible?
Adding new fields is safe.
If you add new fields, any messages serialized by code using your “old” message format can still be parsed by your new generated code. You should keep in mind the default values for these elements so that new code can properly interact with messages generated by old code. Similarly, messages created by your new code can be parsed by your old code: old binaries simply ignore the new field when parsing. See the Unknown Fields section for details.
From https://protobuf.dev/programming-guides/proto3/#wire-safe-changes
Old code should ignore it, and new code handles the missing/default value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I just realized that setting it as uint64
means that we can't differentiate between a hash of 0 or the default value, so will need to update that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to *uint64
in fd5c146
func getOrCreateRing(rings map[discovery.KeyspaceShardTabletType]*hashRing, tablet *discovery.TabletHealth) *hashRing { | ||
key := discovery.KeyFromTarget(tablet.Target) | ||
|
||
ring, exists := rings[key] | ||
if !exists { | ||
ring = newHashRing() | ||
rings[key] = ring | ||
} | ||
|
||
return ring | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean there will be a new hashring created for every session? Shouldn't we have one hashring per shard known per vtgate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be one hash ring per target (keyspace + shard + tablet type) and shared across all sessions (assuming the tablet gateway that creates the balancer is also shared across all sessions, which I'm pretty sure it is but I didn't actually confirm that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. Thanks for clarifying! ❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make an optimization here to only consider the keyspaces set by --balancer-keyspaces
, so we're not maintaining hash rings for keyspaces that aren't sent through the balancer.
One thing that would be good to add is allowing clients to switch the load balancing mode through a client side flag (in addition to managing the default load balancing behavior through the |
go/vt/vtgate/balancer/balancer.go
Outdated
Pick(target *querypb.Target, tablets []*discovery.TabletHealth) *discovery.TabletHealth | ||
Pick(target *querypb.Target, tablets []*discovery.TabletHealth, invalidTablets map[string]bool, opts *PickOpts) *discovery.TabletHealth | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have PickOpts contain all the options needed, we can move invalidTablets in that struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can pass it by value as we do not modify it.
go/vt/vtgate/balancer/session.go
Outdated
// tabletTypesToWatch are the tablet types that will be included in the hash rings. | ||
var tabletTypesToWatch = []topodata.TabletType{topodata.TabletType_PRIMARY, topodata.TabletType_REPLICA, topodata.TabletType_BATCH} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Do you need Primary for balancing?
- Still most places we refer BATCH as RDONLY, should we use that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed primaries and left it as RDONLY in 4682bf9
go/vt/vtgate/balancer/session.go
Outdated
func (b *SessionBalancer) watchHealthCheck(ctx context.Context, topoServer srvtopo.Server) { | ||
// Build initial hash rings | ||
|
||
// Find all the targets we're watching | ||
targets, _, err := srvtopo.FindAllTargetsAndKeyspaces(ctx, topoServer, b.localCell, discovery.KeyspacesToWatch, tabletTypesToWatch) | ||
if err != nil { | ||
log.Errorf("session balancer: failed to find all targets and keyspaces: %q", err) | ||
return | ||
} | ||
|
||
// Add each tablet to the hash ring | ||
for _, target := range targets { | ||
tablets := b.hc.GetHealthyTabletStats(target) | ||
for _, tablet := range tablets { | ||
b.onTabletHealthChange(tablet) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this will miss adding new targets to the balancer as it is initialized only once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just for the initial hash ring setup, and then any new targets returned by the health check will have a new hash ring created: https://github.com/vitessio/vitess/pull/18552/files#diff-a70b24fd86763104c3092c7da62fa559dce14f6bee289a8c883e26d062b0aafcR172-R181
You changed the order so that first the health check is set up, and then the latest view of tablets is used to setup the hash rings. Is that safe from race conditions? 🤔 |
Yeah I was worried about the same thing. The reasoning behind why I switched it is that if we build the initial hash rings and then set up the health check after, there might be a little time in between where we lose updates. But this way there might be a case where the health check sees a tablet go out of serving, removes it from the hash ring, then the initial hash ring build overwrites it with a stale serving tablet. I'll keep thinking about this one. |
Changed in 977d2fa:
|
Add `PickOpts` which allow balancers to accept options specific to their implementation. This allows the `Pick` signature not to get overly long as implementations and their options are added. Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
This reverts commit bd6a1b7. Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
7b7f78e
to
479cde4
Compare
Update the signature of the wrapper func to accept a new `WrapOpts` struct, which currently contains `ExecuteOptions`, which now contains the session UUID so that it can be passed into `Pick` for the balancer. Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
@@ -127,7 +127,7 @@ func (vte *VTExplain) newTablet(ctx context.Context, env *vtenv.Environment, opt | |||
|
|||
tablet.QueryService = queryservice.Wrap( | |||
nil, | |||
func(ctx context.Context, target *querypb.Target, conn queryservice.QueryService, name string, inTransaction bool, inner func(context.Context, *querypb.Target, queryservice.QueryService) (bool, error)) error { | |||
func(ctx context.Context, target *querypb.Target, conn queryservice.QueryService, name string, opts *queryservice.WrapOpts, inner func(context.Context, *querypb.Target, queryservice.QueryService) (bool, error)) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does WrapOpts
get modified down the stack? If not, then this should probably not be passed as a reference but by value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in 2620b76
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to note I also changed ExecuteOptions
inside of WrapOpts
to a pointer, as go vet would complain that we are copying a mutex (used inside protobuf's internal structs): 0db53cd
go/vt/vtgate/tabletgateway.go
Outdated
}) | ||
} | ||
|
||
opts := &balancer.PickOpts{SessionUUID: opts.Options.SessionUUID, InvalidTablets: invalidTablets} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar here - I think this might be better passed by value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in e62c88f
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
I'm currently working on writing an e2e test. |
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Description
This PR builds on #11959 to create a "sticky" session VTGate balancer. It uses a consistent hash to route connections to the same tablet consistently for its duration, with a preference to local cells. It implements the
TabletBalancer
interface (with a few changes).The balancer works by maintaining two hash rings for each target, one for local tablets and one for external tablets. It subscribes to health check events to keep the hash rings updated as tablets go in and out of serving, rather than constructing the hash on-demand.
A new
PickOpts
is added to thePick
method in theTabletBalancer
interface. It currently only contains the current session hash (rather than the UUID so we hash only once on session creation and not on each call toPick
). I thought that as more balancer implementations get added, they might have their own custom parameters, so having an isolated struct so that the signature doesn't get overly long might be desirable.On call to
Pick
, the balancer will look up the session hash inPickOpts
and use it to find the tablet to route to, first looking at the local tablets. If there are no local tablets, it will look for an external tablet. For a given session UUID/hash, the balancer will route it to the same tablet.Related Issue(s)
#11971
Checklist
Deployment Notes