Skip to content

Conversation

@gabotechs
Copy link
Contributor

@gabotechs gabotechs commented Jan 12, 2026

Which issue does this PR close?

  • Closes #.

Rationale for this change

This is a PR from a batch of PRs that attempt to improve performance in hash joins:

It adds a building block that allows eagerly collecting data on the probe side of a hash join before the build side is finished.

Even if the intended use case is for hash joins, the new execution node is generic and is designed to work anywhere in the plan.

What changes are included in this PR?

Adds a new BufferExec node that can buffer up to a certain size in bytes for each partition eagerly performing work that otherwise would be delayed.

Schematically, it looks like this:

             ┌───────────────────────────┐
             │        BufferExec         │
             │                           │
             │┌────── Partition 0 ──────┐│
             ││            ┌────┐ ┌────┐││       ┌────┐
 ──background poll────────▶│    │ │    ├┼┼───────▶    │
             ││            └────┘ └────┘││       └────┘
             │└─────────────────────────┘│
             │┌────── Partition 1 ──────┐│
             ││     ┌────┐ ┌────┐ ┌────┐││       ┌────┐
 ──background poll─▶│    │ │    │ │    ├┼┼───────▶    │
             ││     └────┘ └────┘ └────┘││       └────┘
             │└─────────────────────────┘│
             │                           │
             │           ...             │
             │                           │
             │┌────── Partition N ──────┐│
             ││                   ┌────┐││       ┌────┐
 ──background poll───────────────▶│    ├┼┼───────▶    │
             ││                   └────┘││       └────┘
             │└─────────────────────────┘│
             └───────────────────────────┘

Are these changes tested?

yes, by new unit tests

Are there any user-facing changes?

users can import a new BufferExec execution plan in their codebase, but no internal usage is shipped yet in this PR.

@github-actions github-actions bot added core Core DataFusion crate execution Related to the execution crate proto Related to proto crate datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate datasource Changes to the datasource crate execution Related to the execution crate physical-plan Changes to the physical-plan crate proto Related to proto crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant