Skip to content

branch-4.0: [fix](cloud) Delete local rowsets before add_rowsets in cloud schema change #62256#62310

Open
github-actions[bot] wants to merge 1 commit into
branch-4.0from
auto-pick-62256-branch-4.0
Open

branch-4.0: [fix](cloud) Delete local rowsets before add_rowsets in cloud schema change #62256#62310
github-actions[bot] wants to merge 1 commit into
branch-4.0from
auto-pick-62256-branch-4.0

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Cherry-picked from #62256

@bobhan1
Copy link
Copy Markdown
Contributor

bobhan1 commented Apr 30, 2026

run buildall

@bobhan1
Copy link
Copy Markdown
Contributor

bobhan1 commented May 11, 2026

run p0

…change (#62256)

### What problem does this PR solve?

Problem Summary:

During cloud schema change, the MS (Meta Service) side correctly
recycles rowsets in `[2, alter_version]` on the new tablet when
committing the SC job. However, the BE side did not mirror this behavior
— it directly called `add_rowsets` for the SC output without first
removing existing local rowsets. This could leave stale rowsets (e.g.,
compaction outputs on the new tablet) visible in `_rs_version_map`, and
since their delete bitmap does not cover the SC output rows, duplicate
keys may appear in MOW tables.

PR #61089 increased the likelihood of triggering this issue by enabling
compaction on new tablets during SC, which makes it more common for the
new tablet to have compaction rowsets with wider version ranges (e.g.,
`[818-822]`) that overlap with individual SC output rowsets (e.g.,
`[818],[819],...,[822]`). The `add_rowsets` overlap check
(`to_add_v.contains(v)`) is one-directional: `[818].contains([818-822])`
evaluates to false, so the stale compaction rowset was not removed.

Fix: Before calling `add_rowsets` for SC output, delete all local
rowsets in `[2, alter_version]` from the new tablet, mirroring the
MS-side recycle behavior. A new
`CloudTablet::delete_rowsets_for_schema_change` method is added that
also removes edges from the version graph, preventing the greedy capture
algorithm from preferring the wider stale compaction path over the
individual SC output rowsets.
@bobhan1 bobhan1 force-pushed the auto-pick-62256-branch-4.0 branch from 0eeb0f5 to de41408 Compare May 11, 2026 07:39
@bobhan1
Copy link
Copy Markdown
Contributor

bobhan1 commented May 11, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (34/34) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.35% (25449/35669)
Line Coverage 54.20% (269534/497322)
Region Coverage 51.84% (223415/430931)
Branch Coverage 53.15% (95941/180523)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants