Support KeyRange in `--clusters_to_watch` flag #17604

GuptaManan100 · 2025-01-22T04:23:06Z

Description

This PR addresses the feature request #17537.
Now clusters_to_watch flag accepts key ranges. The format for input is still the same, and therefore backward compatible.
Internally Vitess now treats the input as key ranges instead of explicit shard names.

This allows the users to not restart VTOrc in case of a reshard. For example, if a VTOrc is configured to watch ks/-80, then it would watch all the shards that fall under the KeyRange -80. If a reshard is run and, -80 is split into new shards -40, and 40-80, the VTOrc instance will automatically start watching the new shard without needing a restart.

As part of this change, I have restructured the shardsToWatch map to store key ranges instead of strings. I've also made it so that we don't need to update the shards information by insteading storing in the map that we are watching all the shards using a complete key range.

Related Issue(s)

Closes Feature Request: support ranges of shards in VTOrc --clusters_to_watch #17537

Checklist

"Backport to:" labels have been added if this change should be back-ported to release branches
If this change is to be back-ported to previous releases, a justification is included in the PR description
Tests were added or are not required
Did the new or modified tests pass consistently locally and on CI?
Documentation was added or is not required

Deployment Notes

…exact shard values Signed-off-by: Manan Gupta <[email protected]>

Signed-off-by: Manan Gupta <[email protected]>

vitess-bot · 2025-01-22T04:23:09Z

Signed-off-by: Manan Gupta <[email protected]>

GuptaManan100 · 2025-01-22T05:44:17Z

go/vt/vtorc/logic/vtorc.go

-	// Refresh shards to watch.
-	eg.Go(func() error {
-		updateShardsToWatch()
-		return nil
-	})
-


We don't need this anymore, because we don't store the explicit shard names for the keyspaces that we want to watch in the entirety. Instead we just store a complete key range and that doesn't change during the course of running of a VTOrc instance.

codecov · 2025-01-22T10:49:36Z

Codecov Report

Attention: Patch coverage is 0% with 43 lines in your changes missing coverage. Please review.

Project coverage is 67.63%. Comparing base (d1aa2f4) to head (8c12c59).
Report is 9 commits behind head on main.

Files with missing lines	Patch %	Lines
go/vt/vtorc/logic/tablet_discovery.go	0.00%	35 Missing ⚠️
go/vt/key/key.go	0.00%	5 Missing ⚠️
go/vt/vtorc/logic/keyspace_shard_discovery.go	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #17604      +/-   ##
==========================================
- Coverage   67.68%   67.63%   -0.06%     
==========================================
  Files        1586     1586              
  Lines      255170   255625     +455     
==========================================
+ Hits       172722   172888     +166     
- Misses      82448    82737     +289

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Manan Gupta <[email protected]>

timvaillancourt

LGTM, thanks @GuptaManan100 🚀

timvaillancourt · 2025-01-22T15:36:30Z

changelog/22.0/22.0.0/summary.md

@@ -133,6 +134,11 @@ This is the last time this will be needed in the `8.0.x` series, as starting wit

 The base system now uses Debian Bookworm instead of Debian Bullseye for the `vitess/lite` images. This change was brought by [Pull Request #17552].

+### <a id="key-range-vtorc"/>KeyRanges in `--clusters_to_watch` in VTOrc</a>
+VTOrc now supports specifying KeyRanges in the `--clusters_to_watch` flag. This is useful in scenarios where you don't need to restart a VTOrc instance if you run a reshard.
+For example, if a VTOrc is configured to watch `ks/-80`, then it would watch all the shards that fall under the KeyRange `-80`. If a reshard is run and, `-80` is split into new shards `-40`, and `40-80`, the VTOrc instance will automatically start watching the new shard without needing a restart.


@GuptaManan100 I think this explains the "new" behaviour very well, but I feel like a user of the old way won't fully understand the risks

For example, a user with --clusters_to_watch foo/-80 in the previous logic will watch 1 (or 0) shards only. After this change, with the same flag value they would watch any shard from - to 80, which could be surprising. So I guess I'm saying explaining the difference between the old logic and this PR might make this risk more clear

What should I add to make it more clear? Should I add a line something like -
In the previous logic, watching a shard -80 would watch 1 (or 0) shard only. In the new system, since we interpret -80 as a key range, it can lead to a watch on multiple shards as described in the example.

@GuptaManan100 that extra detail is great 👍

mattlord

LGTM! I just have the one noted concern. Let me know if I'm missing or misunderstanding something. Either way, I'll come back to this quickly and we can get it merged. ❤️

mattlord · 2025-01-23T23:31:51Z

go/vt/vtorc/logic/tablet_discovery.go

-			ks = strings.TrimSuffix(ks, "/")
-			shards, err := ts.GetShardNames(ctx, ks)
+			// Parse the shard name into key range value.
+			_, keyRange, err := topo.ValidateShardName(s)


Your comments seemed to indicate that we don't support arbitrary shard names but only key ranges. This function supports both arbitrary names and key ranges. I think what you might want instead is key.IsValidKeyRange and key.ParseShardingSpec

mattlord · 2025-01-23T23:33:36Z

go/vt/vtorc/logic/tablet_discovery.go

+			if keyRange == nil {
+				keyRange = key.NewCompleteKeyRange()


I don't think this is correct, as the shard name can be "2" or "wassup" — in which case topo.ValidateShardName will return nil for the KeyRange — and only cover a subset. This is related to my comment above.

Signed-off-by: Manan Gupta <[email protected]>

GuptaManan100 · 2025-01-24T10:27:07Z

@mattlord I talked to @deepthi and she said that we only support 0 as a name of a shard other than range based shards. Everything else like 2 and wassup are names that we no longer support. They are legacy names that might be around, but we want to drop support for 0 too, and go to only range based shard names.

In this light, the way the function is implemented (and corresponding tests verify this), is that if a users gives us something like ks/2 to watch, we will watch the entire keyspace ks because 2 is not a range based shard. The only use case should be if a user has an unsharded keysapce ks and they specify ks/0 and this too would work fine, because we would watch the entire shard range and the unsharded shard will be part of it.

Do you think it's worth adding the logic to continue to support arbitrary shard names? I can do that too, by basically creating another map from keyspace to shard names (like we had before), in addition to what we have, and store the shards that aren't range based in them, and continue to run equality checks for them. I personally, don't think we should add support for constructs that we've deprecated (I've added a catch-all to still watch all the shards), but I don't have strong opinions.

mattlord · 2025-01-24T12:36:11Z

@mattlord I talked to @deepthi and she said that we only support 0 as a name of a shard other than range based shards. Everything else like 2 and wassup are names that we no longer support. They are legacy names that might be around, but we want to drop support for 0 too, and go to only range based shard names.

That is exactly what the key package does. It supports 0 as a valid key range alias for -. So I'm saying that the code should do proper input validation to enforce this and return an error to the user if they specify an invalid key range (rather than accepting it and doing something unexpected).

In this light, the way the function is implemented (and corresponding tests verify this), is that if a users gives us something like ks/2 to watch, we will watch the entire keyspace ks because 2 is not a range based shard. The only use case should be if a user has an unsharded keysapce ks and they specify ks/0 and this too would work fine, because we would watch the entire shard range and the unsharded shard will be part of it.

But your usage of topo.ValidateShardName, which supports the use of arbitrary shard names, means that we accept them as input. Why would we do that rather than using key.IsValidKeyRange and key.ParseShardingSpec to do proper input validation? Again, those do support 0 as an alias for - if that was your concern.

Do you think it's worth adding the logic to continue to support arbitrary shard names? I can do that too, by basically creating another map from keyspace to shard names (like we had before), in addition to what we have, and store the shards that aren't range based in them, and continue to run equality checks for them. I personally, don't think we should add support for constructs that we've deprecated (I've added a catch-all to still watch all the shards), but I don't have strong opinions.

I'm not suggesting we support arbitrary shard names, I'm suggesting that we do not allow it. 🙂

I'm not suggesting that we support shard names, I'm saying that currently we do support them because the input validator that you're using — topo.ValidateShardName — supports them. And with unexpected behavior IMO (we treat all shard names as -). I'm saying that we should instead do input validation to enforce the key range requirement and return an error to the user if they specify shard values that are NOT valid key ranges (the keys package already supports 0 as an alias for - if that was your concern).

@mattlord I talked to @deepthi and she said that we only support 0 as a name of a shard other than range based shards. Everything else like 2 and wassup are names that we no longer support. They are legacy names that might be around, but we want to drop support for 0 too, and go to only range based shard names.

Yes, I know. I'm suggesting that the code should match your intentions here and do input validation and alert the user to this problematic usage (they can always just specify the keyspace name by itself).

In this light, the way the function is implemented (and corresponding tests verify this), is that if a users gives us something like ks/2 to watch, we will watch the entire keyspace ks because 2 is not a range based shard. The only use case should be if a user has an unsharded keysapce ks and they specify ks/0 and this too would work fine, because we would watch the entire shard range and the unsharded shard will be part of it.

IMO ks/2 as input should return an error.

Does this make sense? I don't feel too strongly about it, so I'm OK with you making the final call.

mattlord · 2025-01-24T12:40:13Z

@GuptaManan100 why I think this discussion is worth having and why IMO we should do proper input validation is this... let's say that I have two shards: ks/-80 and ks/80- and I only want this vtorc to watch -80. But when settings things up I make a typo and use ks/80. vtorc will happily start up without saying anything about my input, and it will do something I do not want or expect: it will watch both shards.

GuptaManan100 added 2 commits January 22, 2025 09:42

feat: change how clusters to watch to accept shard ranges instead of …

8bb50ba

…exact shard values Signed-off-by: Manan Gupta <[email protected]>

feat: minor refactor and summary changes

a1b000b

Signed-off-by: Manan Gupta <[email protected]>

GuptaManan100 added Type: Feature Component: VTorc Vitess Orchestrator integration labels Jan 22, 2025

GuptaManan100 requested review from deepthi, shlomi-noach, timvaillancourt and mattlord as code owners January 22, 2025 04:23

github-actions bot added this to the v22.0.0 milestone Jan 22, 2025

Merge remote-tracking branch 'upstream/main' into cluster-watch-range

22446ab

Signed-off-by: Manan Gupta <[email protected]>

GuptaManan100 mentioned this pull request Jan 22, 2025

Feature Request: support ranges of shards in VTOrc --clusters_to_watch #17537

Open

GuptaManan100 commented Jan 22, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/main' into cluster-watch-range

8901696

Signed-off-by: Manan Gupta <[email protected]>

timvaillancourt approved these changes Jan 22, 2025

View reviewed changes

timvaillancourt reviewed Jan 22, 2025

View reviewed changes

mattlord reviewed Jan 23, 2025

View reviewed changes

feat: add summary changes

8c12c59

Signed-off-by: Manan Gupta <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support KeyRange in `--clusters_to_watch` flag #17604

Support KeyRange in `--clusters_to_watch` flag #17604

GuptaManan100 commented Jan 22, 2025 •

edited

Loading

vitess-bot bot commented Jan 22, 2025

GuptaManan100 Jan 22, 2025

codecov bot commented Jan 22, 2025 •

edited

Loading

timvaillancourt left a comment

timvaillancourt Jan 22, 2025 •

edited

Loading

GuptaManan100 Jan 23, 2025

timvaillancourt Jan 23, 2025

mattlord left a comment

mattlord Jan 23, 2025

mattlord Jan 23, 2025

GuptaManan100 commented Jan 24, 2025

mattlord commented Jan 24, 2025

mattlord commented Jan 24, 2025 •

edited

Loading

Support KeyRange in --clusters_to_watch flag #17604

Are you sure you want to change the base?

Support KeyRange in --clusters_to_watch flag #17604

Conversation

GuptaManan100 commented Jan 22, 2025 • edited Loading

Description

Related Issue(s)

Checklist

Deployment Notes

vitess-bot bot commented Jan 22, 2025

Review Checklist

General

Tests

Documentation

New flags

If a workflow is added or modified:

Backward compatibility

GuptaManan100 Jan 22, 2025

Choose a reason for hiding this comment

codecov bot commented Jan 22, 2025 • edited Loading

Codecov Report

timvaillancourt left a comment

Choose a reason for hiding this comment

timvaillancourt Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

GuptaManan100 Jan 23, 2025

Choose a reason for hiding this comment

timvaillancourt Jan 23, 2025

Choose a reason for hiding this comment

mattlord left a comment

Choose a reason for hiding this comment

mattlord Jan 23, 2025

Choose a reason for hiding this comment

mattlord Jan 23, 2025

Choose a reason for hiding this comment

GuptaManan100 commented Jan 24, 2025

mattlord commented Jan 24, 2025

mattlord commented Jan 24, 2025 • edited Loading

Support KeyRange in `--clusters_to_watch` flag #17604

Support KeyRange in `--clusters_to_watch` flag #17604

GuptaManan100 commented Jan 22, 2025 •

edited

Loading

codecov bot commented Jan 22, 2025 •

edited

Loading

timvaillancourt Jan 22, 2025 •

edited

Loading

mattlord commented Jan 24, 2025 •

edited

Loading