Skip to content

Preserve recursive CTE static schema with plan-time schema alignment#22037

Open
kosiew wants to merge 5 commits intoapache:mainfrom
kosiew:nullability-mismatch-22034
Open

Preserve recursive CTE static schema with plan-time schema alignment#22037
kosiew wants to merge 5 commits intoapache:mainfrom
kosiew:nullability-mismatch-22034

Conversation

@kosiew
Copy link
Copy Markdown
Contributor

@kosiew kosiew commented May 6, 2026

Which issue does this PR close?

Rationale for this change

RecursiveQueryExec widened recursive CTE output nullability by reconciling the static and recursive term schemas. This caused the physical schema to diverge from the logical/static CTE schema and forced valid SQL such as 0 AS level to be rewritten as nullable expressions like SUM(0) AS level.

This change preserves the declared recursive CTE schema by treating the static/anchor term schema as authoritative and aligning the recursive term to that schema during plan construction.

What changes are included in this PR?

  • Added align_plan_to_schema, a higher-level plan-time schema alignment helper that guarantees the resulting plan advertises the expected schema exactly.

  • Kept project_plan_to_schema as the narrower projection-based helper and refactored shared validation into validate_schema_alignment.

  • Added SchemaAlignExec, an execution-plan adapter that:

    • advertises the expected schema from plan properties
    • preserves positional column values
    • rebinds emitted RecordBatch schemas inside the adapter
    • validates column count, data types, field metadata, and schema metadata
  • Updated RecursiveQueryExec::try_new to:

    • use the static term schema as the recursive CTE output schema
    • align the recursive term with align_plan_to_schema
    • remove recursive output schema widening logic
  • Restored the recursive CTE SLT coverage from SUM(0) AS level back to 0 AS level.

Are these changes tested?

Yes.

Added and updated tests covering:

  • align_plan_to_schema:

    • exact schema returns unchanged plan
    • rename-only alignment uses ProjectionExec
    • nullable input to non-null expected schema uses SchemaAlignExec
    • column count mismatch errors
    • type mismatch errors
    • field metadata mismatch errors
    • schema metadata mismatch errors
  • project_plan_to_schema:

    • schema match passthrough
    • nullability widening
    • nullability narrowing rejection
    • metadata mismatch validation
  • RecursiveQueryExec:

    • recursive term projection alignment
    • preservation of the static nullability contract
    • recursive term schema matches the static schema after construction
  • Restored SQL logic test coverage in cte.slt using 0 AS level.

Validated with:

cargo test -p datafusion-physical-plan recursive_query_exec
cargo test -p datafusion-physical-plan project_plan_to_schema
cargo test -p datafusion-sqllogictest --test sqllogictests -- cte

Are there any user-facing changes?

Yes.

Recursive CTEs now preserve the declared/static schema instead of widening nullability based on recursive expressions. Existing valid SQL such as:

0 AS level

continues to work without requiring nullable rewrites like:

SUM(0) AS level

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

kosiew added 5 commits May 6, 2026 11:49
- Added `align_plan_to_schema` and `SchemaAlignExec` for improved schema alignment in execution plans.
- Maintained strict behavior in `project_plan_to_schema` for projection-only cases.
- Updated adapter to handle nullability narrowing while preserving SQL behavior.
- Modified `RecursiveQueryExec` to preserve static/declared schema and aligned recursive term at plan construction.
- Removed nullability-widening schema synthesis for cleaner execution.
- Restored `0 AS` level in SQL logic test file `cte.slt`.
…ent behavior

- Added direct tests for align_plan_to_schema:
- Verified exact schema returns the same plan.
- Ensured rename-only uses ProjectionExec.
- Confirmed nullability narrowing uses SchemaAlignExec.
- Tested count/type/field metadata/schema metadata errors.
- Documented conservative property behavior in the adapter path.
- Refactored `align_plan_to_schema` function to store input schema in a variable, reducing redundant calls.
- Updated validation and comparison logic for better clarity and performance.
- Simplified partitioning handling in `SchemaAlignExec` by consolidating pattern matching.
- Enhanced `DisplayAs` implementation to correctly handle `TreeRender` format.
…odules

- Reuse `input_schema` in common.rs
- Simplify projected return using `debug_assert_eq!`
- Utilize `partition_count()` in common.rs
- Modify TreeRender to return `Ok(())`
- Reuse `static_schema` in tests for recursive_query.rs
- Removed redundant upfront align validation in common.rs.
- Added test helpers in common.rs:
- single_field_schema
- single_i32_exec
- metadata mismatch builders
- Shortened repeated test setup in common.rs.
- Added recursive_exec test helper in recursive_query.rs.
- Simplified RecursiveQueryExec::try_new(...) in recursive_query.rs.
@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels May 6, 2026
@kosiew kosiew marked this pull request as ready for review May 6, 2026 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Preserve recursive CTE declared schema when aligning physical children

1 participant