Skip to content

Add preimage optimization for ceil to rewrite predicates into range filters#20541

Open
kosiew wants to merge 7 commits intoapache:mainfrom
kosiew:preimage-functions-20197
Open

Add preimage optimization for ceil to rewrite predicates into range filters#20541
kosiew wants to merge 7 commits intoapache:mainfrom
kosiew:preimage-functions-20197

Conversation

@kosiew
Copy link
Contributor

@kosiew kosiew commented Feb 25, 2026

Which issue does this PR close?


Rationale for this change

DataFusion’s preimage framework can turn predicates on deterministic functions into equivalent predicates on the underlying column(s). For ceil(x), the mathematical preimage for a target integer value N is the interval (N − 1, N].

Without a preimage implementation, filters such as WHERE ceil(col) = 6 must evaluate ceil for every row, which can inhibit predicate pushdown and other optimizer wins.

This PR implements ceil’s preimage to enable rewriting comparisons into simple range predicates (with careful handling of floating-point representation boundaries and decimals), improving the optimizer’s ability to push filters down to scans and reduce work during execution.


What changes are included in this PR?

  • Implemented ScalarUDFImpl::preimage for the ceil scalar function.

    • Computes the preimage range for ceil(x) = N as a half-open interval suitable for the Interval framework.
    • Uses next_up for floating-point bounds so that the strict lower bound (N-1, …] is represented safely as x >= next_up(N-1) and the inclusive upper bound … <= N becomes x < next_up(N).
    • Rejects non-integer literals (no solutions) and non-finite float literals (NaN/±Inf).
    • Avoids unsafe rewrites when N - 1 collapses to N due to float spacing (e.g., above 2^53 for f64, above 2^24 for f32).
  • Added decimal preimage support for Decimal32/64/128/256.

    • Validates that the literal has no fractional part at the declared scale.
    • Computes bounds using the decimal unit at the target scale (step = 10^-scale) to represent (N-1, N] as [N-1+step, N+step).
    • Handles scale 0 (integer decimals) as [N, N+1).
  • Added unit tests covering:

    • Valid ranges for floats (positive/negative/zero), integers, and decimals.
    • Non-integer literals returning PreimageResult::None.
    • Overflow and float boundary conditions.
    • NULL literals.
  • Added a new SQLLogicTest file ceil_preimage.slt.

    • Verifies correctness of results for representative types (Float64, Int32 via coercion, Decimal).
    • Verifies optimizer rewrites via EXPLAIN for =, IN, IS [NOT] DISTINCT FROM, and boundary cases.

Are these changes tested?

Yes.

  • Rust unit tests added in datafusion/functions/src/math/ceil.rs validate:

    • Correct range generation for supported scalar types.
    • Correct rejection of non-integer / non-finite float literals.
    • Overflow and precision boundary handling.
    • Decimal scale/precision behavior and NULL handling.
  • SQLLogicTest added in datafusion/sqllogictest/test_files/ceil_preimage.slt validates:

    • Query result correctness for rewritten predicates.
    • Logical plan rewrites using EXPLAIN (including float next_up bounds and decimal bounds).

Are there any user-facing changes?

No user-visible behavior changes are intended. The semantics of ceil are unchanged.

This is an optimizer improvement that may:

  • Produce different (but equivalent) logical plans when predicates involve ceil.
  • Improve performance for queries that filter on ceil(col) by enabling range filtering and better predicate pushdown.

No documentation updates are required.


LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Feb 25, 2026
@kosiew kosiew marked this pull request as ready for review February 25, 2026 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant