Support NULL time in First/Last/FirstBy/LastBy aggregations#17064
Merged
JackieTien97 merged 1 commit intoapache:masterfrom Jan 28, 2026
Merged
Support NULL time in First/Last/FirstBy/LastBy aggregations#17064JackieTien97 merged 1 commit intoapache:masterfrom
JackieTien97 merged 1 commit intoapache:masterfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This pull request adds comprehensive support for NULL timestamps in FIRST, LAST, FIRST_BY, and LAST_BY aggregation functions. The implementation prioritizes data with valid timestamps over NULL timestamp data, and when only NULL timestamp data exists, adopts a "First Null Wins" strategy where the first encountered NULL time value is retained.
Changes:
- Added
initNullTimeValuestracking field to all grouped accumulators andinitNullTimeValueto non-grouped accumulators - Enhanced serialization format to include an additional boolean flag indicating whether the order time is NULL
- Implemented separate update methods for NULL time values across all data types (INT32, INT64, FLOAT, DOUBLE, BINARY, BOOLEAN)
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| GroupedLastByAccumulator.java | Added NULL time tracking with initNullTimeValues field and separate update paths for NULL vs valid timestamps |
| GroupedLastAccumulator.java | Similar NULL time support for LAST aggregation with dual initialization tracking |
| GroupedFirstByAccumulator.java | Mirrored LAST logic but with minimum time comparison for FIRST semantics |
| GroupedFirstAccumulator.java | NULL time handling for non-BY FIRST aggregation |
| Utils.java | New serialization methods serializeTimeValueWithNull to handle the extended format with NULL time flags |
| LastDescAccumulator.java | Descending order variant with NULL time value handling |
| LastByDescAccumulator.java | Descending LAST_BY with NULL time support |
| LastByAccumulator.java | Non-grouped LAST_BY with NULL time tracking |
| LastAccumulator.java | Non-grouped LAST with NULL time support |
| FirstDescAccumulator.java | Descending FIRST with NULL time handling |
| FirstByDescAccumulator.java | Descending FIRST_BY with NULL time support |
| FirstByAccumulator.java | Non-grouped FIRST_BY with NULL time tracking |
| FirstAccumulator.java | Non-grouped FIRST with comprehensive NULL time support |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
JackieTien97
requested changes
Jan 23, 2026
7b11bf8 to
6ee894a
Compare
6ee894a to
ab5757f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description:
This PR introduces robust support for NULL timestamps in FIRST, LAST, FIRST_BY, and LAST_BY aggregation functions. Previously, rows with NULL time might have been handled inconsistently or ignored.
Key Changes:
Null Time Handling Strategy:
Priority: Data with a Valid Time always takes precedence over data with a Null Time.
Fallback Policy: If only NULL time data exists, the aggregations now adopt a "First Null Wins" strategy. This means the first encountered value with a NULL timestamp is retained as the result, ensuring consistent behavior across both First and Last families.
Comprehensive Support:
Applied these changes to both standard Accumulators (for global aggregation) and GroupedAccumulators (for GROUP BY queries).
Covered all supported data types (INT32, INT64, FLOAT, DOUBLE, BINARY, BOOLEAN).