Skip to content

Conversation

@alpass163
Copy link
Contributor

Description:

This PR introduces robust support for NULL timestamps in FIRST, LAST, FIRST_BY, and LAST_BY aggregation functions. Previously, rows with NULL time might have been handled inconsistently or ignored.

Key Changes:

Null Time Handling Strategy:

Priority: Data with a Valid Time always takes precedence over data with a Null Time.

Fallback Policy: If only NULL time data exists, the aggregations now adopt a "First Null Wins" strategy. This means the first encountered value with a NULL timestamp is retained as the result, ensuring consistent behavior across both First and Last families.

Comprehensive Support:

Applied these changes to both standard Accumulators (for global aggregation) and GroupedAccumulators (for GROUP BY queries).

Covered all supported data types (INT32, INT64, FLOAT, DOUBLE, BINARY, BOOLEAN).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds comprehensive support for NULL timestamps in FIRST, LAST, FIRST_BY, and LAST_BY aggregation functions. The implementation prioritizes data with valid timestamps over NULL timestamp data, and when only NULL timestamp data exists, adopts a "First Null Wins" strategy where the first encountered NULL time value is retained.

Changes:

  • Added initNullTimeValues tracking field to all grouped accumulators and initNullTimeValue to non-grouped accumulators
  • Enhanced serialization format to include an additional boolean flag indicating whether the order time is NULL
  • Implemented separate update methods for NULL time values across all data types (INT32, INT64, FLOAT, DOUBLE, BINARY, BOOLEAN)

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
GroupedLastByAccumulator.java Added NULL time tracking with initNullTimeValues field and separate update paths for NULL vs valid timestamps
GroupedLastAccumulator.java Similar NULL time support for LAST aggregation with dual initialization tracking
GroupedFirstByAccumulator.java Mirrored LAST logic but with minimum time comparison for FIRST semantics
GroupedFirstAccumulator.java NULL time handling for non-BY FIRST aggregation
Utils.java New serialization methods serializeTimeValueWithNull to handle the extended format with NULL time flags
LastDescAccumulator.java Descending order variant with NULL time value handling
LastByDescAccumulator.java Descending LAST_BY with NULL time support
LastByAccumulator.java Non-grouped LAST_BY with NULL time tracking
LastAccumulator.java Non-grouped LAST with NULL time support
FirstDescAccumulator.java Descending FIRST with NULL time handling
FirstByDescAccumulator.java Descending FIRST_BY with NULL time support
FirstByAccumulator.java Non-grouped FIRST_BY with NULL time tracking
FirstAccumulator.java Non-grouped FIRST with comprehensive NULL time support

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@alpass163 alpass163 force-pushed the cyb/accmulators_nulls_strategy branch 4 times, most recently from 7b11bf8 to 6ee894a Compare January 27, 2026 08:36
@alpass163 alpass163 force-pushed the cyb/accmulators_nulls_strategy branch from 6ee894a to ab5757f Compare January 27, 2026 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants