workload-learning: Add a workload-learning cache worker#59909
workload-learning: Add a workload-learning cache worker#59909ti-chi-bot[bot] merged 1 commit intopingcap:masterfrom
Conversation
|
Skipping CI for Draft Pull Request. |
|
Hi @elsa0520. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #59909 +/- ##
================================================
+ Coverage 73.1725% 74.9009% +1.7284%
================================================
Files 1706 1753 +47
Lines 471408 479651 +8243
================================================
+ Hits 344941 359263 +14322
+ Misses 105292 97783 -7509
- Partials 21175 22605 +1430
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
PR Overview
This PR introduces a new workload-learning cache worker to refresh and retrieve table cost metrics from the workload_values table in memory. Key changes include:
- Addition of file pkg/workloadlearning/cache.go implementing the WLCacheWorker and associated caching logic.
- Integration of the new cache worker into the workload-based learning worker setup in pkg/domain/domain.go.
Reviewed Changes
| File | Description |
|---|---|
| pkg/workloadlearning/cache.go | Introduces the WLCacheWorker with caching, JSON unmarshalling, and atomic update logic. |
| pkg/domain/domain.go | Integrates the new cache worker into the workload-based learning worker process. |
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
pkg/workloadlearning/cache.go:46
- Consider initializing TableCostMetrics to an empty map (e.g., make(map[int64]*ReadTableCostMetrics)) to avoid potential nil map issues when accessing the cache.
return &WLCacheWorker{pool, &ReadTableCostCache{}, sync.RWMutex{}}
pkg/workloadlearning/cache.go
Outdated
| } | ||
|
|
||
| // GetTableCostMetrics returns the cached metrics for a given table ID | ||
| func (cw *WLCacheWorker) GetTableCostMetrics(tableID int64) *ReadTableCostMetrics { |
There was a problem hiding this comment.
I think the priority queue should process all values at once. However, it’s fine to keep the current one and add a new method to retrieve all values.
There was a problem hiding this comment.
If you need to process all values at once, just don't forget to fetch and release the RWLock ~
There was a problem hiding this comment.
Pull Request Overview
This PR adds a new workload-learning cache worker that maintains an in‑memory cache of table cost metrics and integrates it into the workload learning process.
- Introduces WLCacheWorker in pkg/workloadlearning/cache.go for caching table cost metrics.
- Updates the workload learning handle and domain worker to use a DestroyableSessionPool and trigger cache updates.
- Adds unit tests for the cache update logic in pkg/workloadlearning/cache_test.go.
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| pkg/workloadlearning/cache.go | New cache worker implementation for asynchronously updating table cost metrics. |
| pkg/workloadlearning/cache_test.go | Unit tests to verify the cache update functionality. |
| pkg/workloadlearning/handle.go | Updates to use DestroyableSessionPool and improved session cleanup in metric saving. |
| pkg/domain/domain.go | Integration of the new WLCacheWorker into the domain’s workload learning worker. |
Comments suppressed due to low confidence (2)
pkg/workloadlearning/cache.go:46
- It is recommended to initialize the TableCostMetrics field in ReadTableCostCache (e.g., with make(map[int64]*ReadTableCostMetrics)) to avoid potential nil map issues before the cache is updated.
return &WLCacheWorker{pool, &ReadTableCostCache{}, sync.RWMutex{}}
pkg/workloadlearning/handle.go:140
- Consider verifying that metrics rows have been appended to the SQL builder before removing the trailing comma; otherwise, if no rows were added, this slicing may inadvertently remove part of the header and lead to a malformed SQL statement.
sql := sql.String()[:sql.Len()-2]
pkg/workloadlearning/cache.go
Outdated
| func (cw *WLCacheWorker) GetTableCostMetrics(tableID int64) *ReadTableCostMetrics { | ||
| cw.RWMutex.RLock() | ||
| defer cw.RWMutex.RUnlock() | ||
| metric, exists := cw.readTableCostCache.TableCostMetrics[tableID] |
There was a problem hiding this comment.
I think there is a potential nil panic risk here. We should always initialize the TableCostMetrics.
There was a problem hiding this comment.
Resolve by make the map
pkg/workloadlearning/cache.go
Outdated
| defer func() { | ||
| if err == nil { // only recycle when no error | ||
| cw.sysSessionPool.Put(se) | ||
| } else if err != nil && se != nil { |
There was a problem hiding this comment.
I guess the below code also has this problem.
There was a problem hiding this comment.
I double check the code. The session will be nil before the defer function and it will directly return. So I don't need to recheck in defer function.
Has been changed
pkg/workloadlearning/cache.go
Outdated
| ORDER BY version DESC LIMIT 1` | ||
| rows, _, err := exec.ExecRestrictedSQL(ctx, nil, sql, feedbackCategory, tableCostType) | ||
| if err != nil { | ||
| logutil.BgLogger().Warn("Failed to get the latest table cost version", zap.Error(err)) |
There was a problem hiding this comment.
Do you want to print the error stack here? You might need to use ErrVerboseLogger.
pkg/workloadlearning/cache.go
Outdated
| cw.RWMutex.Lock() | ||
| cw.tableReadCostCache.TableReadCostMetrics = newMetrics | ||
| cw.tableReadCostCache.Version = latestVersionInStorage | ||
| cw.RWMutex.Unlock() |
There was a problem hiding this comment.
better to use defer for safety (for example, if the above line panic, then we'll hold this lock forever)
| return &WLCacheWorker{ | ||
| pool, cache, sync.RWMutex{}} | ||
| } |
There was a problem hiding this comment.
| return &WLCacheWorker{ | |
| pool, cache, sync.RWMutex{}} | |
| } | |
| return &WLCacheWorker{ | |
| pool, cache, sync.RWMutex{}, | |
| } | |
| } |
| type ReadTableCostMetrics struct { | ||
| // TableReadCostMetrics is used to indicate the intermediate status and results analyzed through table read workload | ||
| // for function "HandleTableReadCost". | ||
| type TableReadCostMetrics struct { |
There was a problem hiding this comment.
Maybe we need to add tags for these fields.
[LGTM Timeline notifier]Timeline:
|
There was a problem hiding this comment.
Pull Request Overview
This PR adds a workload-learning cache worker and updates the table read cost caching logic while renaming functions and variables for improved clarity.
- Introduces a new WLCacheWorker in pkg/workloadlearning/cache.go and updates its related tests.
- Renames ReadTableCostMetrics and related functions to TableReadCostMetrics and HandleTableReadCost for consistency.
- Adjusts the domain worker setup to integrate both the learning handle and the new cache worker.
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/workloadlearning/cache_test.go | Adds tests for the new cache worker functionality. |
| pkg/workloadlearning/cache.go | Implements caching logic for table read cost metrics. |
| pkg/workloadlearning/handle.go | Updates workload handle with renaming and batch SQL insertion logic. |
| pkg/domain/domain.go | Integrates the new cache worker into the domain’s worker setup. |
| pkg/workloadlearning/metrics.go | Renames metric types from ReadTableCostMetrics to TableReadCostMetrics. |
| pkg/workloadlearning/handle_test.go | Updates unit tests to reflect the renamed functions and types. |
Comments suppressed due to low confidence (2)
pkg/workloadlearning/handle.go:109
- The function name 'analyzeBasedOnStatementSummary' is inconsistent with the later 'analyzeBasedOnStatementStats'; consider unifying the naming to avoid confusion.
func (*Handle) analyzeBasedOnStatementSummary() []*TableReadCostMetrics {
pkg/workloadlearning/handle.go:141
- [nitpick] Re-declaring the variable 'sql' here shadows the outer variable; consider using a new variable name (e.g., 'finalSQL') for clarity.
sql := sql.String()[:sql.Len()-2]
lance6716
left a comment
There was a problem hiding this comment.
/approve
for domain part
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lance6716, qw4990, Rustin170506 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1. Add a new workload-learning cache worker 2. Implement the read table cost cache logic from workload_values table to memory
2bca707 to
8e42e71
Compare
|
/test all-tests |
|
@elsa0520: The specified target(s) for The following commands are available to trigger optional jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@elsa0520: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@ti-chi-bot[bot]: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test pull-br-integration-test |
|
@elsa0520: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What problem does this PR solve?
Issue Number:ref #58131
Problem Summary:
What changed and how does it work?
The tablecost cache will be called after WorkloadLearningHandle saving the metrics in workload_values table.
This ensures that the cacheworker can update the latest data in memory neither too early nor too late.
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.