-
Notifications
You must be signed in to change notification settings - Fork 2k
Gene.bordegaray/2026/02/partition index dynamic filters #20331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
94f6864
c2095bf
d97d66e
8cf514d
37bf985
d9863c9
d7bbae9
1d8ca7f
0374129
6131fe4
4e23ab5
c3216e4
302a4be
75c6c1f
01ba7b0
995d435
1e15ecb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -996,6 +996,39 @@ config_namespace! { | |
| /// | ||
| /// Note: This may reduce parallelism, rooting from the I/O level, if the number of distinct | ||
| /// partitions is less than the target_partitions. | ||
| /// | ||
| /// Note for partitioned hash join dynamic filtering: | ||
| /// preserving file partitions can allow partition-index routing (`i -> i`) instead of | ||
| /// CASE-hash routing, but this assumes build/probe partition indices stay aligned for | ||
| /// partition hash join / dynamic filter consumers. | ||
| /// | ||
|
Comment on lines
+1000
to
+1004
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This assumption is not specific to dynamic filters. If partition indices are not aligned, data returned in the join would be wrong whether dynamic filters are there or not. As I don't think this advice is specific to dynamic filters, I'd try to keep this doc comment more minimal. Note that this is supposed to be rendered not only as docs, but also as part of a |
||
| /// Misaligned Partitioned Hash Join Example: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this something DataFusion users have to think about? I.e. is this something a user can mess up or would it only happen if there was a bug in DataFusion?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ya, this can be user-triggered, not only a DF bug. If preserve_file_partitions is enabled and the two join sides are not partition-aligned by value/index, partition-index routing is unsafe. A user can declare to preserve their file partitioning, but if they don't have partitioned data. This will be a bug on the behalf of the user In EnforceDistribution for incompatible schemes instead of silently allowing incorrect results. |
||
| /// ```text | ||
| /// ┌───────────────────────────┐ | ||
| /// │ HashJoinExec │ | ||
| /// │ mode=Partitioned │ | ||
| /// │┌───────┐┌───────┐┌───────┐│ | ||
| /// ││ Hash ││ Hash ││ Hash ││ | ||
| /// ││Table 1││Table 2││Table 3││ | ||
| /// ││ ││ ││ ││ | ||
| /// ││ key=A ││ key=B ││ key=C ││ | ||
| /// │└───▲───┘└───▲───┘└───▲───┘│ | ||
| /// └────┴────────┼────────┼────┘ | ||
| /// ... Misaligned! Misaligned! | ||
| /// │ │ | ||
| /// ... ┌───────┼────────┴───────────────┐ | ||
| /// ┌────────┼───────┴───────────────┐ │ | ||
| /// │ │ │ │ │ │ | ||
| ///┌────┴────────┴────────┴────┐ ┌───┴─────────┴────────┴────┐ | ||
| ///│ DataSourceExec │ │ DataSourceExec │ | ||
| ///│┌───────┐┌───────┐┌───────┐│ │┌───────┐┌───────┐┌───────┐│ | ||
| ///││ File ││ File ││ File ││ ││ File ││ File ││ File ││ | ||
| ///││Group 1││Group 2││Group 3││ ││Group 1││Group 2││Group 3││ | ||
| ///││ ││ ││ ││ ││ ││ ││ ││ | ||
| ///││ key=A ││ key=B ││ key=C ││ ││ key=A ││ key=C ││ key=B ││ | ||
| ///│└───────┘└───────┘└───────┘│ │└───────┘└───────┘└───────┘│ | ||
| ///└───────────────────────────┘ └───────────────────────────┘ | ||
gene-bordegaray marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ///``` | ||
| pub preserve_file_partitions: usize, default = 0 | ||
|
|
||
| /// Should DataFusion repartition data using the partitions keys to execute window | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.