-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[IncomingQueryThrottler] Implement Incoming Query Throttler with Strategy Pattern #18449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ry throttler This commit introduces the core interfaces and configuration structures for the incoming query throttler system as outlined in RFC vitessio#18412. Key additions: - ThrottlingStrategyHandler interface for implementing throttling strategies - Config struct with support for multiple throttling strategies: - TabletThrottler (Vitess native tablet throttler) - Cinnamon (Uber's load-shedding system) - ConfigLoader interface for dynamic configuration loading This establishes the architectural foundation for query-level throttling that will be built upon in subsequent commits. Ref: vitessio#18412 Signed-off-by: siddharth16396 <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18449 +/- ##
==========================================
+ Coverage 67.48% 67.51% +0.02%
==========================================
Files 1607 1613 +6
Lines 262818 263265 +447
==========================================
+ Hits 177366 177731 +365
- Misses 85452 85534 +82 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Add file-based configuration loader for the incoming query throttler system. This implementation reads throttler configuration from a JSON file at `/config/throttler-config.json` and supports the Config struct with enabled flag and throttling strategy selection. Key features: - Implements ConfigLoader interface for file-based config loading - Reads from fixed path /config/throttler-config.json - Supports JSON unmarshaling of Config struct with enabled/strategy fields - Comprehensive error handling for file read and JSON parsing failures - Extensive test coverage with stubbed dependencies using gostub library The FileBasedConfigLoader provides a foundation for loading throttler configuration from external files, supporting the broader incoming query throttling system architecture. Related to: vitessio#18412 Signed-off-by: siddharth16396 <[email protected]>
Add file-based configuration loader for the incoming query throttler system. This implementation reads throttler configuration from a JSON file at `/config/throttler-config.json` and supports the Config struct with enabled flag and throttling strategy selection. Key features: - Implements ConfigLoader interface for file-based config loading - Reads from fixed path /config/throttler-config.json - Supports JSON unmarshaling of Config struct with enabled/strategy fields - Comprehensive error handling for file read and JSON parsin failures - Extensive test coverage with stubbed dependencies using gostub library The FileBasedConfigLoader provides a foundation for loading throttler configuration from external files, supporting the broader incoming query throttling system architecture. Related to: vitessio#18412 Signed-off-by: siddharth16396 <[email protected]>
Signed-off-by: siddharth16396 <[email protected]>
This commit introduces a new incoming query throttling system for Vitess tablets with the following key components: Core Implementation: * IncomingQueryThrottler: Main throttler with configurable strategies and automatic config refresh * Strategy pattern with ThrottlingStrategyHandler interface for pluggable throttling algorithms * NoOpStrategy: Default no-operation strategy * CinnamonStrategy: Placeholder for future Cinnamon throttling implementation Key Features: * Automatic configuration refresh every minute via background goroutine * Thread-safe strategy switching with proper lifecycle management (start/stop) * Integration with existing Vitess throttle infrastructure via throttle.Client * Graceful shutdown with resource cleanup Signed-off-by: siddharth16396 <[email protected]>
Signed-off-by: siddharth16396 <[email protected]>
type FileBasedConfigLoader struct{} | ||
|
||
// NewFileBasedConfigLoader creates a new instance of FileBasedConfigLoader with the given file path. | ||
func NewFileBasedConfigLoader() *FileBasedConfigLoader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a TopoServerConfig loader as well, but that will be a seperate PR by another team mate
case ThrottlingStrategyCinnamon: | ||
return &CinnamonStrategy{} | ||
case ThrottlingStrategyTabletThrottler: | ||
fallthrough |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be implemented as a seperate PR by me.
|
||
var ( | ||
_ ConfigLoader = (*FileBasedConfigLoader)(nil) | ||
_osReadFile = os.ReadFile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not update global state like this. Similar to below, we should pass in configuration like this and also then we don't need to mock / change _osReadFile
.
That we add a dependency in go.mod
here is a smell I think and also points at these problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've incorporated the changes and used dependency injection for this
Signed-off-by: siddharth16396 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have this? There is no implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it, the implementation was uber specific code that couldn't be open sourced.
type ThrottlingStrategy string | ||
|
||
// Predefined throttling strategies for the IncomingQueryThrottler. | ||
const ( | ||
// ThrottlingStrategyTabletThrottler uses Vitess Tablet Throttler to shed load | ||
// from incoming queries when the tablet is under pressure. | ||
// Reference: https://vitess.io/docs/21.0/reference/features/tablet-throttler/ | ||
ThrottlingStrategyTabletThrottler ThrottlingStrategy = "TabletThrottler" | ||
|
||
// ThrottlingStrategyCinnamon uses Uber's Cinnamon load-shedding system | ||
// to regulate incoming queries under high load conditions. | ||
// Reference: https://www.uber.com/en-IN/blog/cinnamon-using-century-old-tech-to-build-a-mean-load-shedder/ | ||
ThrottlingStrategyCinnamon ThrottlingStrategy = "Cinnamon" | ||
|
||
// ThrottlingStrategyUnknown is used when the strategy is not known. | ||
ThrottlingStrategyUnknown ThrottlingStrategy = "Unknown" | ||
) | ||
|
||
// Config defines the runtime configuration for the IncomingQueryThrottler. | ||
// It specifies whether throttling is enabled and which strategy to use. | ||
type Config struct { | ||
// Enabled indicates whether the throttler should actively apply throttling logic. | ||
Enabled bool `json:"enabled"` | ||
|
||
// Strategy selects which throttling strategy should be used. | ||
Strategy ThrottlingStrategy `json:"strategy"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of this we can do factory based. where each implementation registers to the factory and then based on the config will load that strategy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done, this strategy is just a enum now.
// Load reads the configuration from the configured file path. | ||
func (f *FileBasedConfigLoader) Load(ctx context.Context) (Config, error) { | ||
data, err := f.readFile(f.configPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you see any custom configPath
be provided? Do you see this as a responsibility of tabletserver to add a flag option for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a good catch, let me implement that via tablet flag.
In our org it is a general practice to have configs under /config
path
var ( | ||
_configRefreshTicket = time.NewTicker(1 * time.Minute) | ||
) | ||
|
||
type IncomingQueryThrottler struct { | ||
ctx context.Context | ||
throttleClient *throttle.Client | ||
mu sync.RWMutex | ||
// cfg holds the current configuration for the throttler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not use the global ticker here and pass the timer in constructor.
This way we will also remove the usage of gostub in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is taken care of now.
newCfg, err := i.cfgLoader.Load(i.ctx) | ||
if err != nil { | ||
log.Errorf("Error loading config: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is an error loading the config, we cannot continue with this. It will panic with nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, i will be taking care of this
Signed-off-by: siddharth16396 <[email protected]>
@@ -79,6 +79,7 @@ const ( | |||
|
|||
TestingName Name = "test" | |||
TestingAlwaysThrottledName Name = "always-throttled-app" | |||
IncomingQueryThrottlerName Name = "incoming-query-throttler-app" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand in what sense this is an "app" (or a client). The incoming query throttler seems to be a server. Why is it necessarily coupled with a "incoming-query-throttler-app"
app name? What happens if I query using an "online-ddl"
app name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We’ll actually be creating a new throttler specifically for the tablet throttler strategy, rather than re-using the existing lag throttler in tablets — mainly because we didn’t want to risk breaking existing behavior with our changes. For this new throttler, we needed to provide an AppName, which is why we introduced this name.
From what I can tell, there’s no side effect if another component queries using the same app name; it won’t impact their behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm still not following. What is this "app" going to be doing? What component is going to be assuming this name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how it looks in our fork (for tablet throttler strategy usage)
func NewIncomingQueryThrottler(ctx context.Context, throttler *throttle.Throttler, cfgLoader ConfigLoader, env tabletenv.Env) *IncomingQueryThrottler {
client := throttle.NewBackgroundClient(throttler, throttlerapp.IncomingQueryThrottlerName, base.UndefinedScope)
i := &IncomingQueryThrottler{
ctx: ctx,
throttleClient: client,
tabletConfig: env.Config(),
cfg: Config{},
cfgLoader: cfgLoader,
strategy: ®istry.NoOpStrategy{}, // default strategy until config is loaded
}
// Start the initial strategy
i.strategy.Start()
// starting the loop which will be responsible for refreshing the config.
i.startConfigRefreshLoop()
return i
}
and it is called in tabletserver.go
like
tsv.incomingQueryThrottler = incomingquerythrottler.NewIncomingQueryThrottler(ctx, tsv.iqThrottler, incomingquerythrottler.NewTopoServerConfigLoader(topoServer, alias), tsv)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tablet throttler strategy implementation will be in a seperate PR after this is merged.
|
||
// ThrottlingStrategyUnknown is used when the strategy is not known. | ||
ThrottlingStrategyUnknown ThrottlingStrategy = "Unknown" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain the need for strategies if the only two are the already existing TabletThrottler and the other is what looks like a catch-all?
Is this meant as a basis to you (e.g. Uber) implementing a private throttler strategy? (namely Cinnamon, mentioned in the PR comment?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we chose to implement this using strategies to keep it extensible — for example, in our fork we’re already using Cinnamon in some places, and this structure makes it easier for others to plug in their own throttler strategies if they need to.
Additionally, the strategy pattern provides clarity and modularity: each throttling approach is encapsulated and can evolve independently without complicating the core logic. This keeps the codebase easier to maintain and reason about, while also making it clear which strategy is being applied in a given context.
Signed-off-by: siddharth16396 <[email protected]>
Signed-off-by: siddharth16396 <[email protected]>
Signed-off-by: siddharth16396 <[email protected]>
Signed-off-by: siddharth16396 <[email protected]>
Signed-off-by: siddharth16396 <[email protected]>
Signed-off-by: siddharth16396 <[email protected]>
CI is failing on flag changes related to the PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use Viper config to load it dynamically.
Here is my prototype for doing it.
c431c40
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still having difficulties understanding the design. It looks like this PR is just part 1 of a multipart submission, and apart from a couple changed files only includes new files. I find it difficult to foresee the intended use. I have carefully read #18412 and I'm left with more questions given the lack of implementation in this PR. I think I understand and appreciate the desire to split the contribution, but in this case it's difficult for me to review&accept this abstract part without foreseeing how it works in practice.
Some thoughts:
- By reading #18412, correct me if I'm wrong, it looks like the new
IncomingQueryThrottler
is not a throttler per se, but instead a layer that stands before a throttler (which can beTabletThrottler
or yourCinnamon
). Correct? This "Throttler" naming was highly confusing to me at first and IMHO should change. - Assuming the above is correct, I understand better how
IncomingQueryThrottlerName
is an "app". - From #18412, I'm unsure where exactly "Before executing a query, TabletServer calls EnforceThrottlingIfNodeOverloaded."
Are you intending to plug this mechanism inQueryExecutor
? - Looking at
Start()
,Stop()
functions inThrottlingStrategyHandler
, the way these are called is not compliant with how we start/stop other services in vitess. Take a look at state manager, and for examplt at how the tablet throttler, or tabletGC, are started/stopped. There's a set of rules for how this should work on primary/replica, for example, and a specific ordering. - I'm unsure where exactly "TabletServer creates an IncomingQueryThrottler during startup." takes place. Again, consult state manager.
- Please refrain from using names such as "Cinnamon" and instead use descriptive names.
- For dynamic configuration, the Vitess project is using
viper
. - Or store configuration in the topology service and Watch it, see tablet throttler.
- "Configuration system - File-based config loader reading from /config/throttler-config.json with enabled/disabled flag and strategy selection" -- IMHO we definitely want to move away from file based configuration as this causes deployment discrepancies.
- There seems to be some subversion of ownership. Say you use
TabletThrottler
and it returns an OK check result -- that still does not mean you're going to run the query, correct? You apply further logic on top of the check result, like comparing the lag value with your own settings. This perhaps circles back to my first bullet, "looks like the newIncomingQueryThrottler
is not a throttler per se" and maybe it is a throttler? Is is your intention to implement a chain of throttlers? If so, how do you ensure the two configurations (your throttler configuration +TabletThrottler
configuration) are at all compatible?
@shlomi-noach : thanks for the detailed comments and i agree that breaking this change into smaller changes is causing a little review/understanding friction, i'll reply to your comments and post that i will add more commits to make this particular PR more understandable.
Yes, "IncomingQueryThrottler" is just the entry point for throttling world. "EnforceThrottlingIfNodeOverloaded" is called from
I'll check on this definitely.
We have the topolo service configuration system in our fork, another team mate of mine would raise a PR for this soon.
Acknowledged, i will be removing it
Yes you are absolutely right, i'm sharing a snippet of that code below, i didn't add it to this PR because that would make this PR too big.. func (s *TabletThrottlerStrategy) Evaluate(ctx context.Context, targetTabletType topodatapb.TabletType, sql string, transactionID int64, options *querypb.ExecuteOptions) registry.ThrottleDecision {
// FAST PATH: Check if system is healthy before doing any expensive work
// This optimizes for the common case (90-95% of queries) where checkOk == true
if s.running.Load() {
if state := s.cachedState.Load(); state != nil && state.ok {
return registry.ThrottleDecision{
Throttle: false,
Message: "System healthy, fast-path bypass",
}
}
}
// Extract workload name and priority from options, use defaults if not provided
workloadName := s.extractWorkloadName(options)
priority := s.extractPriority(options)
// Step 1: Early priority-based throttling check
// Similar to tx_throttler.go: lower priority values (higher priority) are less likely to be throttled
// Priority behavior:
// - Priority 0 (highest): NEVER throttled (rand(0-99) < 0 is always false)
// - Priority 100 (lowest): ALWAYS checked for throttling (rand(0-99) < 100 is always true)
// - Priority 1-99: Probabilistically checked based on priority value
// If priority check fails, skip all expensive throttle checks
priorityCheck := _randIntN(sqlparser.MaxPriorityValue) < priority
if !priorityCheck {
return registry.ThrottleDecision{
Throttle: false,
Message: fmt.Sprintf("High priority query (priority=%d), skip throttling", priority),
}
}
// Step 2: Look up throttling rules for this tablet type (e.g., PRIMARY, REPLICA)
stmtRules, ok := s.config.TabletRules[targetTabletType.String()]
if !ok {
return registry.ThrottleDecision{
Throttle: false,
Message: fmt.Sprintf("No throttling rules for tablet type: %s", targetTabletType.String()),
}
}
// Step 3: Determine SQL statement type (e.g., INSERT, SELECT)
stmtType := sqlparser.Preview(sql).String()
// Step 4: Look up metric rules for this statement type
metricRules, ok := stmtRules[stmtType]
if !ok {
return registry.ThrottleDecision{
Throttle: false,
Message: fmt.Sprintf("No throttling rules for SQL type: %s", stmtType),
}
}
// Step 5: Get cached throttle check results
// checkResult is guaranteed to be non-nil now due to fallback logic in getCachedThrottleResult
checkResult, checkOk := s.getCachedThrottleResult(ctx)
// If check passes, system is not overloaded → allow query
if checkOk {
return registry.ThrottleDecision{
Throttle: false,
Message: "System not overloaded, allowing query",
}
}
// Step 6: Evaluate metrics that exceeded thresholds
for metricName, result := range checkResult.Metrics {
// Skip metrics that did not breach a configured threshold
if result.ResponseCode != tabletmanagerdata.CheckThrottlerResponseCode_THRESHOLD_EXCEEDED {
continue
}
// Only act on metrics explicitly configured for this SQL type
rule, found := metricRules[metricName]
if !found {
continue
}
// Step 7: Calculate throttle probability and breached threshold
throttleProb, breachedThreshold := getThrottleDecision(result.Value, rule.Thresholds)
// Step 8: Apply probabilistic throttling (priority check already passed)
if throttleProb > _randFloat64() {
return registry.ThrottleDecision{
Throttle: true,
Message: fmt.Sprintf("[VTTabletThrottler] Query throttled: workload=%s priority=%d metric=%s value=%.2f breached threshold=%.2f throttle=%.0f%%", workloadName, priority, metricName, result.Value, breachedThreshold, throttleProb*100),
MetricName: metricName,
MetricValue: result.Value,
Threshold: breachedThreshold,
ThrottlePercentage: throttleProb,
}
}
}
// No throttle triggered → allow query
return registry.ThrottleDecision{
Throttle: false,
Message: "No throttling conditions met",
}
} |
@harshit-gangal / @shlomi-noach : I'll check on viper config, i've not used it previously so would require some time. |
Signed-off-by: siddharth16396 <[email protected]>
Signed-off-by: siddharth16396 <[email protected]>
@shlomi-noach : i am guessing d44af58 helps a bit. |
@siddharth16396 thank you. FWIW I think it's OK to defer some work to future PRs, as long as we commit to doing them. |
Both places have pros and cons. The way you call it now in On the other hand, the query plan parses the SQL statement, something that you're going to do as well. So there's waste of computation if the query does not get throttled. I'm thinking the best solution would be to generate the plan, then use |
@@ -467,6 +474,7 @@ func (sm *stateManager) servePrimary() error { | |||
sm.te.AcceptReadWrite() | |||
sm.messager.Open() | |||
sm.throttler.Open() | |||
sm.iqThrottler.Open() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to sm.iqThrottler.Close()
in serveNonPrimary
, see how sm.tableGC.
works.
verifySubcomponent(t, 10, sm.watcher, testStateClosed) | ||
verifySubcomponent(t, 11, sm.vstreamer, testStateClosed) | ||
verifySubcomponent(t, 12, sm.rt, testStateClosed) | ||
verifySubcomponent(t, 13, sm.se, testStateClosed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice job
Signed-off-by: siddharth16396 <[email protected]>
Description
This PR introduces a comprehensive incoming query throttling system for Vitess tablets outlined in RFC #18412 with the following key components:
IncomingQueryThrottler
with strategy pattern for pluggable throttling algorithms/config/throttler-config.json
with enabled/disabled flag and strategy selectionThe system supports multiple throttling strategies (TabletThrottler, Cinnamon) and can be dynamically reconfigured without restart. The implementation follows Vitess patterns and integrates with the existing throttle client infrastructure.
Tests
Ref: #18412
Checklist
Deployment Notes