[BugFix] Guard against iterator UB in get_column_values when rssid not found#69617
[BugFix] Guard against iterator UB in get_column_values when rssid not found#69617sevev merged 5 commits intoStarRocks:mainfrom
Conversation
🌎 Translation Required?✅ All translation files are up to date.
|
1 similar comment
🌎 Translation Required?✅ All translation files are up to date.
|
|
@codex review |
|
Codex Review: Didn't find any major issues. Can't wait for the next one! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…t found When the PrimaryIndex contains stale entries pointing to rowsets that have been compacted away, upper_bound() returns begin() and --iter causes undefined behavior (decrementing before begin), leading to a SIGSEGV crash. Add a bounds check before decrementing the iterator to return an error instead of crashing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: luohaha <18810541851@163.com>
Add test that verifies get_column_values returns InternalError instead of crashing when called with a stale rssid that no longer exists in rssid_to_rowsets after compaction. The test commits multiple rowsets, triggers compaction, then queries with the old (pre-compaction) rssid to exercise the new begin() iterator guard. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: luohaha <18810541851@163.com>
Include the actual minimum rssid from rssid_to_rowsets in the error message to help diagnose the gap between stale and current seg ids. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: luohaha <18810541851@163.com>
1959af7 to
23f31e8
Compare
Compaction creates a minor version (4.1), not major version 5. Use version 4 to get the applied rowsets after compaction. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: luohaha <18810541851@163.com>
After compaction, old rowset entries remain in the _rowsets map until remove_expired_versions GC runs. Call remove_expired_versions(INT64_MAX) to force cleanup so old_rssid is no longer in rssid_to_rowsets, which properly triggers the begin() iterator guard. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: luohaha <18810541851@163.com>
|
@codex review |
|
Codex Review: Didn't find any major issues. Nice work! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 6 / 6 (100.00%) file detail
|
|
@Mergifyio backport branch-4.0 |
|
@Mergifyio backport branch-3.5 |
|
@Mergifyio backport branch-4.1 |
✅ Backports have been createdDetails
|
✅ Backports have been createdDetails
|
✅ Backports have been createdDetails
|
|
https://github.com/Mergifyio backport branch-3.5.14 |
✅ Backports have been createdDetails
|
Why I'm doing:
When the PrimaryIndex contains stale entries pointing to rowsets that have been compacted away,
TabletUpdates::get_column_valuescrashes with SIGSEGV. The root cause is thatupper_bound(rssid)returnsbegin()when the requested rssid is smaller than all keys inrssid_to_rowsets, and then--iteronbegin()is undefined behavior. In practice this causes the iterator to point to the map's sentinel node, leading to dereference of corrupted memory.Core dump analysis confirmed:
rssid = 432750from PrimaryIndex lookuprssid_to_rowsetsmin key =476512(old rowsets already compacted)--begin()UB → dereference of corruptedRowsetMetaPBpointer0x80804509150b1880→ SIGSEGVWhat I'm doing:
Add a bounds check before decrementing the
upper_bounditerator. Wheniter == begin(), returnInternalErrorwith a descriptive message instead of crashing.What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: