[SPARK-56739][SQL] Normalize CTE ids of orphan CTERelationRef in NormalizeCTEIds#56083
[SPARK-56739][SQL] Normalize CTE ids of orphan CTERelationRef in NormalizeCTEIds#56083shrirangmhalgi wants to merge 1 commit into
Conversation
…alizeCTEIds CTERelationRef nodes that exist outside any WithCTE node (e.g., after InlineCTE or MergeSubplans removes the parent WithCTE) were not getting their IDs normalized. This causes plan comparison and caching to fail because the same logical plan gets different CTE IDs across sessions. Add a case in applyInternal to normalize orphan CTERelationRefs whose cteId is already in the mapping.
4d36b6e to
59ebea5
Compare
|
@yaooqinn / @cloud-fan could you please review the following PR. |
|
Does #55985 fix this issue? |
|
@cloud-fan I do not think #55985 fixes this. They address different symptoms of the same root cause. #55985 fixes double-mapping of nested CTE refs: This PR fixes a different issue, orphan Both stem from the same design issue: One concern with the current approach: the second-pass A potentially simpler unified fix would be to separate the two concerns: collect all CTE def ids first, then apply the mapping globally (2-pass): This fixes both issues: no double-mapping (each node visited once in Pass 2), no orphan misses (Pass 2 transforms all refs globally), CTAS handled explicitly in both passes, and eliminates Happy to help with a PR if this direction is interesting. |
| import org.apache.spark.sql.types.IntegerType | ||
|
|
||
| class NormalizeCTEIdsSuite extends SparkFunSuite { | ||
| test("SPARK-56739: orphan CTERelationRef outside WithCTE should be normalized") { |
There was a problem hiding this comment.
This is not an end-to-end test, we can also say the bug is where orphan CTERelationRef were created, and NormalizeCTEIds does nothing wrong.
There was a problem hiding this comment.
Agreed this is a unit test. However, NormalizeCTEIds is a normalization rule - its contract is to produce deterministic IDs for any valid plan shape. The test constructs a valid post-optimization state (orphan ref after InlineCTE removes the wrapper), and the rule fails to normalize it. Whether the bug manifests as a cache miss or a crash depends on downstream consumers, but the normalization contract is broken regardless. I've rebased on top of #55985 (now merged) and the fix works cleanly as a complement to it. Happy to adjust the approach if you'd prefer fixing this at the InlineCTE level instead.
|
Thanks for the thorough analysis @rdtr. |
What changes were proposed in this pull request?
Normalize CTE IDs of orphan
CTERelationRefnodes inNormalizeCTEIds. Previously, onlyCTERelationRefnodes insideWithCTEwere normalized viacanonicalizeCTE. Refs that exist outside anyWithCTE(orphans) kept their original IDs.Why are the changes needed?
After
InlineCTEorMergeSubplansruns, someCTERelationRefnodes can end up outside their parentWithCTEnode. WhenNormalizeCTEIdsprocesses the plan, these orphan refs are skipped, leaving non-normalized IDs. This breaks plan comparison and caching because the same logical plan gets different CTE IDs across sessions (sinceCTERelationDefuses a global monotonically increasing counter).Does this PR introduce any user-facing change?
No. This is an internal plan normalization fix that affects plan caching correctness.
How was this patch tested?
Added
NormalizeCTEIdsSuitewith a test that constructs a plan with aCTERelationRefoutsideWithCTEand verifies all ref IDs are normalized. Without the fix, the orphan ref retains its original ID (100); with the fix, it's normalized to 0.Was this patch authored or co-authored using generative AI tooling?
Yes.