-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sizegen: do not ignore type aliases #17556
Conversation
Signed-off-by: Vicent Marti <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
@@ -163,6 +163,8 @@ func (sizegen *sizegen) generateTyp(tt types.Type) { | |||
sizegen.generateKnownType(tt) | |||
case *types.Alias: | |||
sizegen.generateTyp(types.Unalias(tt)) | |||
default: | |||
panic(fmt.Sprintf("unhandled type: %v (%T)", tt, tt)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now make sure we panic here on unexpected values as well.
default: | ||
log.Printf("unhandled type: %T", node) | ||
return nil, 0 | ||
panic(fmt.Sprintf("unhandled type: %v (%T)", node, node)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not just print a warning here, but actively panic. This is so that if we have other new types in the future, we don't ignore this again but actively investigate.
for _, elem := range cached.Rows { | ||
{ | ||
size += hack.RuntimeAllocSize(int64(cap(elem)) * int64(32)) | ||
for _, elem := range elem { | ||
size += elem.CachedSize(false) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the big missing memory calculation for the size of in-memory query results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Excellent find!
Signed-off-by: Vicent Marti <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17556 +/- ##
==========================================
+ Coverage 67.68% 67.72% +0.03%
==========================================
Files 1584 1584
Lines 254718 254721 +3
==========================================
+ Hits 172414 172508 +94
+ Misses 82304 82213 -91 ☔ View full report in Codecov by Sentry. |
Isn't there a unit test that could have caught this? We should add one. |
That's why we changed this to |
Signed-off-by: Vicent Marti <[email protected]> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
Description
Important fix for memory calculations of in-memory data structures. Was causing OOMs in production.
In Go 1.23 a new type alias node was introduced for reflection, see golang/go#63223. This initially caused the sizegen generation to crash, which was fixed by @systay in #16650 but that fix was not complete. It still creates a ton of warnings since that change that were never fixed:
This might have looked innocent, but it certainly is not. The problem we found here specifically starts showing up in the
vttablet
consolidation logic. The problem is that we need these sizes to properly compute the cache. We have a type alias forRow
to[]Value
for the actual row storage which is now ignored.This means that the cache in use size for the consolidator did not actually count the row data! This is basically the main part of what it needs to measure. This in turns leads to excessive memory usage by the consolidator.
This also applies to the normal query consolidator, to the query cache, and any other piece of code that was depending on CachedSize to be accurate in order to limit vttablet memory usage. We're hoping this will cut down on the amount of OOMs we see in production.
cc @dbussink
Related Issue(s)
Checklist
Deployment Notes