Skip to content

[BugFix] Guard against iterator UB in get_column_values when rssid not found#69617

Merged
sevev merged 5 commits intoStarRocks:mainfrom
luohaha:fix/guard-rssid-iterator-ub
Feb 28, 2026
Merged

[BugFix] Guard against iterator UB in get_column_values when rssid not found#69617
sevev merged 5 commits intoStarRocks:mainfrom
luohaha:fix/guard-rssid-iterator-ub

Conversation

@luohaha
Copy link
Contributor

@luohaha luohaha commented Feb 28, 2026

Why I'm doing:

When the PrimaryIndex contains stale entries pointing to rowsets that have been compacted away, TabletUpdates::get_column_values crashes with SIGSEGV. The root cause is that upper_bound(rssid) returns begin() when the requested rssid is smaller than all keys in rssid_to_rowsets, and then --iter on begin() is undefined behavior. In practice this causes the iterator to point to the map's sentinel node, leading to dereference of corrupted memory.

Core dump analysis confirmed:

  • rssid = 432750 from PrimaryIndex lookup
  • rssid_to_rowsets min key = 476512 (old rowsets already compacted)
  • --begin() UB → dereference of corrupted RowsetMetaPB pointer 0x80804509150b1880 → SIGSEGV

What I'm doing:

Add a bounds check before decrementing the upper_bound iterator. When iter == begin(), return InternalError with a descriptive message instead of crashing.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This PR needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport PR

Bugfix cherry-pick branch check:

  • I have checked the version labels which the PR will be auto-backported to the target branch
    • 4.1
    • 4.0
    • 3.5
    • 3.4

@luohaha luohaha requested a review from a team as a code owner February 28, 2026 05:33
@github-actions
Copy link
Contributor

🌎 Translation Required?

All translation files are up to date.
No translation actions are required for this PR.

🕒 Last updated: Sat, 28 Feb 2026 05:51:14 GMT

1 similar comment
@github-actions
Copy link
Contributor

🌎 Translation Required?

All translation files are up to date.
No translation actions are required for this PR.

🕒 Last updated: Sat, 28 Feb 2026 05:51:14 GMT

@CelerData-Reviewer
Copy link

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

luohaha and others added 3 commits February 28, 2026 14:04
…t found

When the PrimaryIndex contains stale entries pointing to rowsets that
have been compacted away, upper_bound() returns begin() and --iter
causes undefined behavior (decrementing before begin), leading to a
SIGSEGV crash.

Add a bounds check before decrementing the iterator to return an error
instead of crashing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: luohaha <18810541851@163.com>
Add test that verifies get_column_values returns InternalError instead
of crashing when called with a stale rssid that no longer exists in
rssid_to_rowsets after compaction. The test commits multiple rowsets,
triggers compaction, then queries with the old (pre-compaction) rssid
to exercise the new begin() iterator guard.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: luohaha <18810541851@163.com>
Include the actual minimum rssid from rssid_to_rowsets in the error
message to help diagnose the gap between stale and current seg ids.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: luohaha <18810541851@163.com>
@luohaha luohaha force-pushed the fix/guard-rssid-iterator-ub branch from 1959af7 to 23f31e8 Compare February 28, 2026 06:05
wyb
wyb previously approved these changes Feb 28, 2026
Compaction creates a minor version (4.1), not major version 5.
Use version 4 to get the applied rowsets after compaction.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: luohaha <18810541851@163.com>
After compaction, old rowset entries remain in the _rowsets map until
remove_expired_versions GC runs. Call remove_expired_versions(INT64_MAX)
to force cleanup so old_rssid is no longer in rssid_to_rowsets, which
properly triggers the begin() iterator guard.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: luohaha <18810541851@163.com>
@CelerData-Reviewer
Copy link

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@github-actions
Copy link
Contributor

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link
Contributor

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link
Contributor

[BE Incremental Coverage Report]

pass : 6 / 6 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/storage/tablet_updates.cpp 6 6 100.00% []

@sevev sevev merged commit 7adfd51 into StarRocks:main Feb 28, 2026
53 checks passed
@github-actions
Copy link
Contributor

@Mergifyio backport branch-4.0

@github-actions github-actions bot removed the 4.0 label Feb 28, 2026
@github-actions
Copy link
Contributor

@Mergifyio backport branch-3.5

@github-actions
Copy link
Contributor

@Mergifyio backport branch-4.1

@mergify
Copy link
Contributor

mergify bot commented Feb 28, 2026

backport branch-4.0

✅ Backports have been created

Details

@mergify
Copy link
Contributor

mergify bot commented Feb 28, 2026

backport branch-3.5

✅ Backports have been created

Details

@mergify
Copy link
Contributor

mergify bot commented Feb 28, 2026

backport branch-4.1

✅ Backports have been created

Details

mergify bot pushed a commit that referenced this pull request Feb 28, 2026
…t found (#69617)

Signed-off-by: luohaha <18810541851@163.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
(cherry picked from commit 7adfd51)
mergify bot pushed a commit that referenced this pull request Feb 28, 2026
…t found (#69617)

Signed-off-by: luohaha <18810541851@163.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
(cherry picked from commit 7adfd51)
mergify bot pushed a commit that referenced this pull request Feb 28, 2026
…t found (#69617)

Signed-off-by: luohaha <18810541851@163.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
(cherry picked from commit 7adfd51)
wanpengfei-git pushed a commit that referenced this pull request Feb 28, 2026
…t found (backport #69617) (#69642)

Signed-off-by: luohaha <18810541851@163.com>
Co-authored-by: Yixin Luo <18810541851@163.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
wanpengfei-git pushed a commit that referenced this pull request Feb 28, 2026
…t found (backport #69617) (#69644)

Signed-off-by: luohaha <18810541851@163.com>
Co-authored-by: Yixin Luo <18810541851@163.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
wanpengfei-git pushed a commit that referenced this pull request Feb 28, 2026
…t found (backport #69617) (#69643)

Signed-off-by: luohaha <18810541851@163.com>
Co-authored-by: Yixin Luo <18810541851@163.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@luohaha
Copy link
Contributor Author

luohaha commented Mar 2, 2026

https://github.com/Mergifyio backport branch-3.5.14

@mergify
Copy link
Contributor

mergify bot commented Mar 2, 2026

backport branch-3.5.14

✅ Backports have been created

Details

mergify bot pushed a commit that referenced this pull request Mar 2, 2026
…t found (#69617)

Signed-off-by: luohaha <18810541851@163.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
(cherry picked from commit 7adfd51)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants