Skip to content

Poc: expose group hash feedback from GroupValues#23229

Draft
Rachelint wants to merge 2 commits into
apache:mainfrom
Rachelint:fuse-aggr-repart-poc-2
Draft

Poc: expose group hash feedback from GroupValues#23229
Rachelint wants to merge 2 commits into
apache:mainfrom
Rachelint:fuse-aggr-repart-poc-2

Conversation

@Rachelint

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Part of hash aggregate repartition POC.

Rationale for this change

This POC prepares GroupValues for partition-aware partial aggregate output. The partial aggregate output path needs to know which input rows created new groups, and the hash for those rows, so it can later route newly-created group ids to target partitions without relying on a separate repartition + coalesce pipeline.

What changes are included in this PR?

  • Extend GroupValues::intern to fill per-row hashes and optionally return the input rows that created new groups.
  • Keep hash calculation inside GroupValues implementations.
  • Wire reusable hash/new-group-row buffers through aggregate hash table and row-hash paths.
  • Update existing call sites and benchmarks to the new intern signature.

Are these changes tested?

Yes:

  • cargo fmt --all
  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo test -p datafusion-physical-plan group_values --lib

@github-actions github-actions Bot added the physical-plan Changes to the physical-plan crate label Jun 29, 2026
@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion-physical-expr-common v54.0.0 (current)
       Built [  24.766s] (current)
     Parsing datafusion-physical-expr-common v54.0.0 (current)
      Parsed [   0.024s] (current)
    Building datafusion-physical-expr-common v54.0.0 (baseline)
       Built [  23.923s] (baseline)
     Parsing datafusion-physical-expr-common v54.0.0 (baseline)
      Parsed [   0.022s] (baseline)
    Checking datafusion-physical-expr-common v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.326s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [  50.270s] datafusion-physical-expr-common
    Building datafusion-physical-plan v54.0.0 (current)
       Built [  36.164s] (current)
     Parsing datafusion-physical-plan v54.0.0 (current)
      Parsed [   0.142s] (current)
    Building datafusion-physical-plan v54.0.0 (baseline)
       Built [  36.042s] (baseline)
     Parsing datafusion-physical-plan v54.0.0 (baseline)
      Parsed [   0.144s] (baseline)
    Checking datafusion-physical-plan v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.925s] 223 checks: 222 pass, 1 fail, 0 warn, 30 skip

--- failure trait_method_parameter_count_changed: pub trait method parameter count changed ---

Description:
A trait method now takes a different number of parameters.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#trait-item-signature
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/trait_method_parameter_count_changed.ron

Failed in:
  GroupValues::intern now takes 4 instead of 2 parameters, in file /home/runner/work/datafusion/datafusion/datafusion/physical-plan/src/aggregates/group_values/mod.rs:102

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  74.892s] datafusion-physical-plan

@github-actions github-actions Bot added the auto detected api change Auto detected API change label Jun 29, 2026
@Rachelint Rachelint force-pushed the fuse-aggr-repart-poc-2 branch from f501156 to 03c7079 Compare June 29, 2026 01:29
@github-actions github-actions Bot added the physical-expr Changes to the physical-expr crates label Jun 29, 2026
@Rachelint Rachelint marked this pull request as draft June 29, 2026 05:37
@Rachelint Rachelint force-pushed the fuse-aggr-repart-poc-2 branch from 8e3137d to 0e09f71 Compare June 29, 2026 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant