Skip to content

feat: implement retract_batch for array_agg sliding window support#22015

Open
SubhamSinghal wants to merge 4 commits intoapache:mainfrom
SubhamSinghal:array-agg-retract-batch
Open

feat: implement retract_batch for array_agg sliding window support#22015
SubhamSinghal wants to merge 4 commits intoapache:mainfrom
SubhamSinghal:array-agg-retract-batch

Conversation

@SubhamSinghal
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #21957.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Yes. Using UT

Are there any user-facing changes?

Yes. array_agg now works with bounded/sliding window frames. Queries that previously errored now succeed:

SELECT array_agg(x) OVER (ORDER BY t ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) FROM t;

No breaking API changes

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 5, 2026
Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SubhamSinghal
Thanks for the work here. I found one issue that looks like it can affect sliding window correctness for array_agg(... IGNORE NULLS)


let val = &values[0];
let mut to_retract = if self.ignore_nulls {
val.len() - val.null_count()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think retract_batch should use the same null definition as update_batch here.

update_batch filters IGNORE NULLS using val.logical_nulls(), but this path counts non-null rows with val.len() - val.null_count(). For arrays where logical nullability differs from physical nullability, such as dictionary or run arrays whose values contain logical nulls, the accumulator can store fewer rows than retract_batch later removes.

That means a sliding array_agg(... IGNORE NULLS) window can over-retract and accidentally drop following values from the frame. Could we switch this to the same null semantics, for example val.len() - val.logical_null_count(), and add a regression test with an input type where logical nulls differ from null_count()?

@SubhamSinghal
Copy link
Copy Markdown
Contributor Author

@kosiew Thanks for highlighting the issue. I have fixed null counting bug and added UT

Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SubhamSinghal
Thanks for addressing the counting issue in retract_batch.
The implementation change looks right to me. I think the regression test still needs one adjustment before this is ready, since it currently does not exercise the logical-null versus physical-null case that caused the original bug.

builder.append_null();
let dict_array: ArrayRef = Arc::new(builder.finish());

assert_eq!(dict_array.null_count(), 2);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this regression test. I think it needs one more tweak to cover the original issue properly.

Right now this uses StringDictionaryBuilder::append_null(), which creates null dictionary keys. Because of that, dict_array.null_count() and dict_array.logical_null_count() are both 2, so the test would still pass with the old val.len() - val.null_count() behavior.

Could you please construct a dictionary or run array with valid physical entries that reference null values instead? It would also be helpful to assert that null_count() != logical_null_count() before checking the retract behavior, so the test clearly covers the regression.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added assertion for null_count() != logical_null_count().

Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SubhamSinghal
Thanks for the iteration

Looks 👍 to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support array_agg as a sliding window aggregate by implementing retract_batch

2 participants