Skip to content

Metrics points and tags/partition index runtime binding#20578

Draft
gene-bordegaray wants to merge 19 commits intoapache:branch-52from
DataDog:metrics-points-and-tags/partition-index-runtime-binding
Draft

Metrics points and tags/partition index runtime binding#20578
gene-bordegaray wants to merge 19 commits intoapache:branch-52from
DataDog:metrics-points-and-tags/partition-index-runtime-binding

Conversation

@gene-bordegaray
Copy link
Contributor

dyn filter stuff for metrics points and tags

LiaCastaneda and others added 11 commits January 30, 2026 09:34
(cherry picked from commit f6450d6)

Co-authored-by: Gabriel Musat Mestre <gabriel.musatmestre@datadoghq.com>
* downgrade substrait

(cherry picked from commit 40242b4)

* downgrade prost

(cherry picked from commit 3ae6613)

* downgrade prost for ffi

(cherry picked from commit 42c8585)

* Fix clippy warning

---------

Co-authored-by: Ahmed Mezghani <ahmed.mezghani@datadoghq.com>
* Fix dynamic filter is_used function (apache#19734)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#19715.

## Rationale for this change

The:is_used() API incorrectly returned false for custom `DataSource`
implementations that didn't call reassign_expr_columns() ->
with_new_children() . This caused `HashJoinExec` to skip computing
dynamic filters even when they were actually being used.

## What changes are included in this PR?

Updated is_used() to check both outer and inner Arc counts

## Are these changes tested?

Functionality is covered by existing test
`test_hashjoin_dynamic_filter_pushdown_is_used`. I was not sure if to
add a repro since it would require adding a custom `DataSource`, the
current tests in
datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs use
`FileScanConfig`

## Are there any user-facing changes?

no

(cherry picked from commit 278950a)

* Simplify wait_complete function (apache#19937)

## Which issue does this PR close?

## Rationale for this change

The current v52 signature `pub async fn wait_complete(self: &Arc<Self>)`
(introduced in apache#19546) is a bit unergonomic. The method requires
`&Arc<DynamicFilterPhysicalExpr>`, but when working with `Arc<dyn
PhysicalExpr>`, downcasting only gives you `&DynamicFilterPhysicalExpr`.
Since you can't convert `&DynamicFilterPhysicalExpr` to
`Arc<DynamicFilterPhysicalExpr>`, the method becomes impossible to call.

The `&Arc<Self>` param was used to check` is_used()` via Arc strong
count, but this was overly defensive.

## What changes are included in this PR?

- Changed `DynamicFilterPhysicalExpr::wait_complete` signature from `pub
async fn wait_complete(self: &Arc<Self>)` to `pub async fn
wait_complete(&self)`.

- Removed the `is_used()` check from `wait_complete()` - this method,
like `wait_update()`, should only be called on filters that have
consumers. If the caller doesn't know whether the filter has consumers,
they should call `is_used()` first to avoid waiting indefinitely. This
approach avoids complex signatures and dependencies between the APIs
methods.

## Are these changes tested?

Yes, existing tests cover this functionality, I removed the "mock"
consumer from `test_hash_join_marks_filter_complete_empty_build_side`
and `test_hash_join_marks_filter_complete` since the fix in
apache#19734 makes is_used check the
outer struct `strong_count` as well.

## Are there any user-facing changes?

The signature of `wait_complete` changed.

(cherry picked from commit bef1368)
fix: datatype_is_logically_equal for Dictionaries
Includes both lines of work needed for metrics-points-and-tags:

- Generic PhysicalExpr runtime binding API (bind_runtime + traversal helpers)

- Partition-aware dynamic filter routing for file-partitioned joins (Hash/PartitionIndex/Global)

- Datasource wiring for partition-specific DynamicFilter evaluation

- Associated tests and sqllogictest updates
@gene-bordegaray gene-bordegaray force-pushed the metrics-points-and-tags/partition-index-runtime-binding branch from cd9ace7 to baa3f1b Compare February 26, 2026 16:39
@github-actions github-actions bot added documentation Improvements or additions to documentation development-process Related to development process of DataFusion logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) common Related to common crate functions Changes to functions implementation datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels Feb 26, 2026
timsaucer and others added 5 commits February 26, 2026 14:29
…ization and deserialization processes (apache#19437)

- Closes apache#18477

This PR adds a new trait for converting to and from Protobuf objects and
Physical expressions and plans.

- Add `PhysicalExtensionProtoCodec` and default implementation.
- Update all methods in the physical encoding/decoding methods to use
this trait.
- Added two examples
- Added unit test

Two examples and round trip unit test are added.

If users are going through the recommended interfaces in the
documentation, `logical_plan_to_bytes` and `logical_plan_from_bytes`
they will have no user facing change. If they are instead calling into
the inner methods `PhysicalPlanNode::try_from_physical_plan` and so on,
then they will need to provide a proto converter. A default
implementation is provided.

---------

Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
… unique identifiers (apache#20037)

Replaces apache#18192 using the APIs in apache#19437.

Similar to apache#18192 the end goal here is specifically to enable
deduplication of `DynamicFilterPhysicalExpr` so that distributed query
engines can get one step closer to using dynamic filters.

Because it's actually simpler we apply this deduplication to all
`PhysicalExpr`s with the added benefit that we more faithfully preserve
the original expression tree (instead of adding new duplicate branches)
which will have the immediate impact of e.g. not duplicating large
`InListExpr`s.
Informs: datafusion-contrib/datafusion-distributed#180
Closes: apache#20418

Consider this scenario
1. You have a plan with a `HashJoinExec` and `DataSourceExec`
2. You run the physical optimizer and the `DataSourceExec` accepts `DynamicFilterPhysicalExpr` pushdown from the `HashJoinExec`
3. You serialize the plan, deserialize it, and execute it

What should happen is that the dynamic filter should "work", meaning
1. When you deserialize the plan, both the `HashJoinExec` and `DataSourceExec` should have pointers to the same `DynamicFilterPhysicalExpr`
2. The `DynamicFilterPhysicalExpr` should be updated during execution by the `HashJoinExec`  and the `DataSourceExec` should filter out rows

This does not happen today for a few reasons, a couple of which this PR aims to address
1. `DynamicFilterPhysicalExpr` is not survive round-tripping. The internal exprs get inlined (ex. it may be serialized as `Literal`)
2. Even if `DynamicFilterPhysicalExpr` survives round-tripping, during pushdown, it's often the case that the `DynamicFilterPhysicalExpr` is rewritten. In this case, you have two `DynamicFilterPhysicalExpr` which are different `Arc`s but share the same `Inner` dynamic filter state. The current `DeduplicatingProtoConverter` does not handle this specific form of deduping.

This PR aims to fix those problems by adding serde for `DynamicFilterPhysicalExpr` and deduping logic for the inner state of dynamic filters.

It does not yet add a test for the `HashJoinExec` and `DataSourceExec` filter pushdown case, but this is relevant follow up work. I tried to keep the PR small for reviewers.

Yes, via unit tests.

`DynamicFilterPhysicalExpr` are now serialized by the default codec
Fixups for the cherry-picked commits from PRs apache#19437, apache#20037, apache#20416,
and jayshrivastava#2 to work with branch-52's partition-index APIs:

- Update remap_children callers to use instance method signature
- Adapt DynamicFilterUpdate::Global enum for new code paths
- Add missing partitioned_exprs/runtime_partition fields to new constructors
- Remove null_aware field (not on branch-52)
- Replace FilterExecBuilder with FilterExec::try_new
- Remove non-compiling tests that depend on upstream-only APIs
- Fix duplicate imports in roundtrip test file

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the proto Related to proto crate label Feb 26, 2026
- Regenerate pbjson.rs from proto definition to fix stale artifacts
- Fix cargo fmt issues
- Remove unfulfilled #[expect(deprecated)] on CoalesceBatchesExec
- Add missing #[test] attribute on partition test
- Remove unused EmptyExec import in aggregates tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…mic filter deduping to work in custom plan nodes
@github-actions github-actions bot added the ffi Changes to the ffi crate label Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate development-process Related to development process of DataFusion documentation Improvements or additions to documentation ffi Changes to the ffi crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants