Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions datafusion/common/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -996,6 +996,39 @@ config_namespace! {
///
/// Note: This may reduce parallelism, rooting from the I/O level, if the number of distinct
/// partitions is less than the target_partitions.
///
/// Note for partitioned hash join dynamic filtering:
/// preserving file partitions can allow partition-index routing (`i -> i`) instead of
/// CASE-hash routing, but this assumes build/probe partition indices stay aligned for
/// partition hash join / dynamic filter consumers.
///
Comment on lines +1000 to +1004
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumption is not specific to dynamic filters. If partition indices are not aligned, data returned in the join would be wrong whether dynamic filters are there or not.

As I don't think this advice is specific to dynamic filters, I'd try to keep this doc comment more minimal. Note that this is supposed to be rendered not only as docs, but also as part of a SHOW ALL query, which might not render nicely, so having ASCII graphics is probably the not most render-friendly thing for this specific case.

/// Misaligned Partitioned Hash Join Example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something DataFusion users have to think about? I.e. is this something a user can mess up or would it only happen if there was a bug in DataFusion?

Copy link
Contributor Author

@gene-bordegaray gene-bordegaray Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, this can be user-triggered, not only a DF bug. If preserve_file_partitions is enabled and the two join sides are not partition-aligned by value/index, partition-index routing is unsafe.

A user can declare to preserve their file partitioning, but if they don't have partitioned data. This will be a bug on the behalf of the user

In EnforceDistribution for incompatible schemes instead of silently allowing incorrect results.

/// ```text
/// ┌───────────────────────────┐
/// │ HashJoinExec │
/// │ mode=Partitioned │
/// │┌───────┐┌───────┐┌───────┐│
/// ││ Hash ││ Hash ││ Hash ││
/// ││Table 1││Table 2││Table 3││
/// ││ ││ ││ ││
/// ││ key=A ││ key=B ││ key=C ││
/// │└───▲───┘└───▲───┘└───▲───┘│
/// └────┴────────┼────────┼────┘
/// ... Misaligned! Misaligned!
/// │ │
/// ... ┌───────┼────────┴───────────────┐
/// ┌────────┼───────┴───────────────┐ │
/// │ │ │ │ │ │
///┌────┴────────┴────────┴────┐ ┌───┴─────────┴────────┴────┐
///│ DataSourceExec │ │ DataSourceExec │
///│┌───────┐┌───────┐┌───────┐│ │┌───────┐┌───────┐┌───────┐│
///││ File ││ File ││ File ││ ││ File ││ File ││ File ││
///││Group 1││Group 2││Group 3││ ││Group 1││Group 2││Group 3││
///││ ││ ││ ││ ││ ││ ││ ││
///││ key=A ││ key=B ││ key=C ││ ││ key=A ││ key=C ││ key=B ││
///│└───────┘└───────┘└───────┘│ │└───────┘└───────┘└───────┘│
///└───────────────────────────┘ └───────────────────────────┘
///```
pub preserve_file_partitions: usize, default = 0

/// Should DataFusion repartition data using the partitions keys to execute window
Expand Down
Loading