Skip to content

Gene.bordegaray/2026/02/partition index dynamic filters#20331

Open
gene-bordegaray wants to merge 17 commits intoapache:mainfrom
gene-bordegaray:gene.bordegaray/2026/02/partition_index_dynamic_filters
Open

Gene.bordegaray/2026/02/partition index dynamic filters#20331
gene-bordegaray wants to merge 17 commits intoapache:mainfrom
gene-bordegaray:gene.bordegaray/2026/02/partition_index_dynamic_filters

Conversation

@gene-bordegaray
Copy link
Contributor

@gene-bordegaray gene-bordegaray commented Feb 12, 2026

Which issue does this PR close?

Closes #20195

Rationale for this change

Dynamic filter pushdown was completely when preserve_file_partitions on due to a correctness bug.

The Problem

When preserve_file_partitions enabled, DataFusion treats file groups as pre-partitioned data. Existing dynamic filtering used hash-based routing which is incompatible with the value-based partitioning that file groups are kept in:

Example:

Table partitioned by col_a:
- Partition 0: col_a = 'A'
- Partition 1: col_a = 'B'
- Partition 2: col_a = 'C'

Dimension Table: values = ['A', 'B']

SELECT * FROM large_table
JOIN small_table ON large_table.col_a = small_table.col_a

Hash routing doesn't work:
- Hash routing: hash('A') % 3 might map to partition 1 (not partition 0)
- File partitioning: 'A' data is in partition 0 (value-based)

For this reason was diabled, this PR re-enables it via PartitionIndex routing for dynamic filters.

What changes are included in this PR?

Partition-Indexed dynamic filtering

New routing mode that uses direct partition-to-partition mapping:

Build partition 0 → filters Probe partition 0
Build partition 1 → filters Probe partition 1
Build partition 2 → filters Probe partition 2

Example:

HashJoinExec: mode=Partitioned, routing=PartitionIndex, on=[col_a = col_b]
    DataSourceExec: table_large, file_groups={3: [col_a=A], [col_a=B], [col_a=C]} predicate=DynamicFilter[
        {0: col_a IN ['A','B']},  -- Partition 0 filtered
        {1: col_a IN ['A','B']},  -- Partition 1 filtered
       {2: col_a IN ['A','B']}   -- Partition 2 filtered (no matches, pruned)
    ]
    DataSourceExec: table_small, values: [col_b='A', col_b='B']

How it works:
- Build partition 0 (col_b='A') creates filter for probe partition 0 (col_a='A')
- Build partition 1 (col_b='B') creates filter for probe partition 1 (col_a='B')
- Probe partition 2 (col_a='C') gets pruned (no matching build partition)

Alignment Detection
Detects compatible partitioning to enable safe optimization:

  • Both sides file-grouped (value-based partitioning) -> PartitionIndex
  • Both sides hash-repartitioned (hash-based partitioning) -> CashHash
  • Both has different partitioning -> Error, this shouldn't happen and can cause incorrect results
    match (left.repartitioned, right.repartitioned) {

In the case there is a RepartitionExec in the path leading from the DataSourceExec to either the build or probe side of a Partitioned Hash Join -> Falls back to CaseHash.

The reason is RepartitionExec uses hash(value) % N to distribute rows, breaking the value-based partition alignment. When hash-partitioned, partition 0 no longer contains 'A' exclusively, breaking the partition index assumptions

With hash partitioning, use:

  CASE hash(col_a) % 4
    WHEN 0 THEN filter_partition_0
    WHEN 1 THEN filter_partition_1
    ...
  END

Are these changes tested?

sqlogictests: test_files/preserve_file_partitioning.slt
Integration tests: datafusion/core/tests/physical_optimizer/filter_pushdown.rs
Unit tests: in effected files

Are there any user-facing changes?

Yes a new error message can appear is partition hash joins are not aligned properly and the dynamic filtering display for partition index is a but different then CASE routing.

cc: @NGA-TRAN @LiaCastaneda @adriangb @gabotechs

@github-actions github-actions bot added documentation Improvements or additions to documentation physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) common Related to common crate proto Related to proto crate datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels Feb 12, 2026
- HashJoinExec: mode=Partitioned, join_type=Inner, on=[(b@0, a@0)]
- RepartitionExec: partitioning=Hash([b@0], 1), input_partitions=1
- HashJoinExec: mode=Partitioned, join_type=Inner, on=[(c@1, d@0)]
- DataSourceExec: file_groups={1 group: [[test.parquet]]}, projection=[b, c, y], file_type=test, pushdown_supported=true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no dynamic filter because its the build side of a build a build side...took me a second 😂

// its own filter.
predicate = predicate
.map(|p| snapshot_physical_expr_for_partition(p, partition_index))
.transpose()?;
Copy link
Contributor Author

@gene-bordegaray gene-bordegaray Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decided to only do this in the parquet opener, if we did for all files (by default) just do nothing since predicates aren't passed to other opneers. This does mean that users will have to implement this for their on data sources.

Given this is a large PR, didn't want to include logic for a fallback and doing nothing seemed out of place, could still reconsider if others have an opinion.

Copy link
Contributor

@LiaCastaneda LiaCastaneda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me and will be very helpful for use cases where we want to avoid repartitioning data. My only concern is that API users would need to align the probe and build side partitions, but this seems like a reasonable tradeoff. Let’s see what other contributors think. (this is a partial review I will finish later today or early next week) but until now it's looking good to me :)

Comment on lines +864 to +868
// One side starts with multiple partitions while target is 1. EnforceDistribution inserts a
// hash repartition on the left child. The partitioning schemes are now misaligned:
// - Left: hash-repartitioned (repartitioned=true)
// - Right: file-grouped (repartitioned=false)
// This is a correctness bug, so we expect an error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have the other way around as well? having a Join of type Partitioned and the left perserving file parttioning and the right having RepartitionExec.

let optimized = ensure_distribution_helper_transform_up(join, 1)?;
assert_plan!(optimized, @r"
HashJoinExec: mode=Partitioned, join_type=Inner, on=[(a@0, a@1)]
DataSourceExec: file_groups={1 group: [[x]]}, projection=[a, b, c, d, e], file_type=parquet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to display if DataSourceExec is perserving partitioning? something like preserve_partitioning=[bool]? this may be useful for users to know why there is no RepartitionExec in the plan even if the mode is Partitioned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be a separate PR, but can create issue 😄

@gene-bordegaray
Copy link
Contributor Author

gene-bordegaray commented Feb 13, 2026

This makes sense to me and will be very helpful for use cases where we want to avoid repartitioning data. My only concern is that API users would need to align the probe and build side partitions, but this seems like a reasonable tradeoff. Let’s see what other contributors think. (this is a partial review I will finish later today or early next week) but until now it's looking good to me :)

💯 thank you for the reviews
I know we have discussed this but want to document here, for the API it is clear that partitioning structure is a bit vague. I would like to start an effort to make partitioning a trait that will more clearly define how data is partitioned to eliminate the overload on Hash partitioning.

Copy link
Contributor

@LiaCastaneda LiaCastaneda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I think I'm done with my review, overall looks good, just some minor comments.

@LiaCastaneda
Copy link
Contributor

I would like to start an effort to make partitioning a trait that will more clearly define how data is partitioned to eliminate the overload on Hash partitioning.

We need to make sure that in the future it’s easy to revert or migrate users away from index-based routing to their custom Partitioning implementation. Since this does not introduce a new API, I don’t think it should be a problem. This was previously a bug, and with this PR dynamic filtering works, but it’s something to keep in mind.

Copy link
Contributor

@NGA-TRAN NGA-TRAN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks great, Gene. Nice work!

I do have some suggestions on comments and test data to make things clearer for reviewers and future maintennce

config.optimizer.enable_round_robin_repartition = false;
config.optimizer.repartition_file_scans = false;
config.optimizer.repartition_file_min_size = 1024;
config.optimizer.prefer_existing_sort = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be clearer if you add comments explaining why you need these settings and for which tests.

@NGA-TRAN
Copy link
Contributor

@LiaCastaneda

My only concern is that API users would need to align the probe and build side partitions, but this seems like a reasonable tradeoff.

Right, usually when users decide to do this custom partitions, they must have a mechanism to enforce it. Thus, I do not think we need to worry about this at this dynamic filtering stage. We only need to provide a way to use the dynamic filtering correctly which is the purpose of this PR.

Comment on lines +50 to +51
/// Per-partition filter expressions indexed by partition number.
type PartitionedFilters = Vec<Option<Arc<dyn PhysicalExpr>>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one more question -- how would evaluation be done for PartitionedFilters?
My understanding is that each partition would need to first access its corresponding PhysicalExpr and then call evaluate() right? However, the evaluate() trait of PhysicalExpr has no partition number in the args, soevaluate()can't directly integrate PartitionedFilters.
The current evaluate() function remains the same and evaluates inner.expr, which, when we preserve file partitioning, holds nothing (just lit(true) placeholder).

Copy link
Contributor Author

@gene-bordegaray gene-bordegaray Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snapshotting happens before evaluation

The full path in chronological order is:

  1. ParquetOpener::open() is called
  2. snapshot_physical_expr_for_partition(predicate, partition_index) is called -> important to note that we pass the index
  3. snapshot_physical_expr_for_partition replaces the DynamicFilterPhysicalExpr with the filter for the partition on that index (this is a physical expr)
  4. evaluate() is called which uses the snapshot expr (not the DynamicFilterPhysicalExpr) and we don't need to know the partition parameter because it was already dealt with earlier

For the lit(true) concern, if has_partitioned_filters() returns false during snapshotting then we will fallback to lit(true) which yes then we won't evaluate anything. But this is ok behavior because its ok to do this rather than error

The shouldn't happen because, the we wait until the build side is complete before we snapshot so we should always resolve to true.

Maybe to be safe would be good to add a debug statement if has_partitioned_filters() returns false.

Lmk if this makes sense 🙂

Copy link
Contributor

@LiaCastaneda LiaCastaneda Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah got it, I missed the fact that this was already integrated into the parquet-source opener, and thought evaluation was different. It's clear now and makes sense to me, thanks for explaining!

Copy link
Contributor

@LiaCastaneda LiaCastaneda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lgtm 👍 it’s just missing a committer’s review.

/// CASE-hash routing, but this assumes build/probe partition indices stay aligned for
/// partition hash join / dynamic filter consumers.
///
/// Misaligned Partitioned Hash Join Example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something DataFusion users have to think about? I.e. is this something a user can mess up or would it only happen if there was a bug in DataFusion?

Copy link
Contributor Author

@gene-bordegaray gene-bordegaray Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, this can be user-triggered, not only a DF bug. If preserve_file_partitions is enabled and the two join sides are not partition-aligned by value/index, partition-index routing is unsafe.

A user can declare to preserve their file partitioning, but if they don't have partitioned data. This will be a bug on the behalf of the user

In EnforceDistribution for incompatible schemes instead of silently allowing incorrect results.

fn should_use_partition_index(&self) -> bool {
matches!(self.mode, PartitionMode::Partitioned)
&& self.dynamic_filter_routing_mode
== DynamicFilterRoutingMode::PartitionIndex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would these two ever not match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if thre is a partitione dhash join and both sides contain a repartition -> it would use CaseHash

Comment on lines +534 to +540
/// Snapshot a `PhysicalExpr` tree, replacing any [`DynamicFilterPhysicalExpr`] that
/// has per-partition data with its partition-specific filter expression.
/// If a `DynamicFilterPhysicalExpr` does not have partitioned data, it is left unchanged.
pub fn snapshot_physical_expr_for_partition(
expr: Arc<dyn PhysicalExpr>,
partition: usize,
) -> Result<Arc<dyn PhysicalExpr>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting way to go about it. It does kind of feel to me like we are trying to shoehorn the concept of partitions into PhysicalExpr through the snapshot() API which was never intended for this use case. I feel there is probably a much cleaner way to introduce the concept of a partition to PhysicalExpr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting way to go about it. It does kind of feel to me like we are trying to shoehorn the concept of partitions into PhysicalExpr through the snapshot() API which was never intended for this use case. I feel there is probably a much cleaner way to introduce the concept of a partition to PhysicalExpr
I agree that overloading the snapshot() is a bit odd. Here are the two ways this can be improved upon:

1. Transmit Partition Info in the Data:
This is the suggestion @adriangb provided. I think it is and interesting approach but has large implications. Injecting a partition-id or something similar would mean changing projections, filter handling etc. With the objective of this PR being to enable dynamic filtering for file preserved partitioning, this seems like too large of a change.

2. Runtime Partition Bind:
Introduce a runtime wrapper PartitionBoundDynamicFilterPhysicalExpr. The parquet opener binds the dynamic filters to partition i by inserting this new wrapper rather than a static snapshot. Then, this wrapper is resolved at evaluation time. This largely avoids the shoehorning partitioning in PhysicalExpr through snapshot().

With this said, I am aware this approach is not the cleanest solution for this issue, but it does set us up for follow-up work. This can become a generic runtime bind API that hooks on PhysicalExpr, by default we will have it be a no-op. Then for DynamicFilterPhysicalExpr we will implement the hook and eliminate our wrapper. This will provide support for any future runtime-dependent expressions without the ad-hoc plumbing we did before and no schema / data changes.

I decided to leave the full generic solution our of this PR to keep these changes as local as I can, but tried to lay a path forward that is maintainable. Let me know thoughts on this approach.

cc: @adriangb @LiaCastaneda

Copy link
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this code, so I'm ramping up with it, but you can expect me to take some time until I start contributing more useful comments.

For now, the big comment I left is mostly intended for triggering some discussions rather than actually jumping into code.

One way of getting your PRs in faster, is to slice them in smaller chunks, for example, maybe shipping the failing tests for the issue this is intended to solve first? I have to say that, even if that's a generally good advice, it might not be good for some PRs. It might not be for this one for example.

Comment on lines +1000 to +1004
/// Note for partitioned hash join dynamic filtering:
/// preserving file partitions can allow partition-index routing (`i -> i`) instead of
/// CASE-hash routing, but this assumes build/probe partition indices stay aligned for
/// partition hash join / dynamic filter consumers.
///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumption is not specific to dynamic filters. If partition indices are not aligned, data returned in the join would be wrong whether dynamic filters are there or not.

As I don't think this advice is specific to dynamic filters, I'd try to keep this doc comment more minimal. Note that this is supposed to be rendered not only as docs, but also as part of a SHOW ALL query, which might not render nicely, so having ASCII graphics is probably the not most render-friendly thing for this specific case.

// - File-grouped: partition 0 = all rows where column="A" (value-based)
// - Hash-repartitioned: partition 0 = all rows where hash(column) % N == 0 (hash-based)
// These are incompatible, so the join will miss matching rows.
plan = if let Some(hash_join) = plan.as_any().downcast_ref::<HashJoinExec>()
Copy link
Contributor

@gabotechs gabotechs Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty smart and clean way of propagating wether data is getting repartitioned across steps 👍

Comment on lines +1593 to +1597
!matches!(
repartition.partitioning(),
Partitioning::UnknownPartitioning(_)
)
} else if child_context.plan.children().is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to not use a matches! macro here, and just manually match each possibility. The reason is that, if in the future a new enum entry is added to Partitioning, the compiler will force the dev to chose a right value here manually, forcing them to think about it.

Comment on lines +111 to 116
/// Per-partition filter expressions for partition-index routing.
/// When both sides of a hash join preserve their file partitioning (no RepartitionExec(Hash)),
/// build-partition i corresponds to probe-partition i. This allows storing per-partition
/// filters so that each partition only sees its own bounds, giving tighter filtering.
partitioned_exprs: PartitionedFilters,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is a better alternative than leaking the concept of plan partitioning into dynamic filters.

I do agree with @adriangb about leaking the concept of partitioning to expressions. I'd say:

  • If we want expressions to be aware of partitions, let's do a proper API design that is as clean and future proof as possible, and use it here.
  • If we want expressions to be agnostic from partitions, let's try to find a way implement this so that DynamicFilterExpressions are still agnostic from partitions.

It would be nice to avoid middle ground solutions that exists just for the sake of shipping PRs faster.

Some ideas that come to mind for keeping dynamic filters agnostic from plan partitioning in its structs:

Introduce a new __partition_index REE column or something similar

In the record batch right before calling evalute() on the dynamic filter expression, we could introduce a new column with the partition index. I think you already explored this without success, but I would like to understand better what blockers specifically you faced, I do see this approach yielding a potential good result.

Use Arrow's Schema metadata for threading the partition index

Arrow's schemas have the concept of metadata that can be used for threading arbitrary information. I think this should also be considered "leaking partitioning details", but I wonder if this brings an opportunity of making it cleaner?

Letting the hash join hold a Vec instead of a single HashJoinExecDynamicFilter

That way, there is as many HashJoinExecDynamicFilter as partitions, and they behave exactly the same as before, but as if there was only 1 partition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the record batch right before calling evalute() on the dynamic filter expression, we could introduce a new column with the partition index. I think you already explored this without success, but I would like to understand better what blockers specifically you faced, I do see this approach yielding a potential good result.

I also was thinking this approach should work.

Just throwing darts at the wall in terms of helping w/ implementation: you could create a marker expression e.g. impl PhysicalExpr for PartitionLiteral and then modify the machinery that replaces hive partition values in scans w/ the actual literals to also replace that with the DataFusion partition integer.

/// Replace column references in the given physical expression with literal values.
///
/// Some use cases for this include:
/// - Partition column pruning: When scanning partitioned data, partition column references
/// can be replaced with their literal values for the specific partition being scanned.
/// - Constant folding: In some cases, columns that can be proven to be constant
/// from statistical analysis may be replaced with their literal values to optimize expression evaluation.
/// - Filling in non-null default values: in a custom [`PhysicalExprAdapter`] implementation,
/// column references can be replaced with default literal values instead of nulls.
///
/// # Arguments
/// - `expr`: The physical expression in which to replace column references.
/// - `replacements`: A mapping from column names to their corresponding literal `ScalarValue`s.
/// Accepts various HashMap types including `HashMap<&str, &ScalarValue>`,
/// `HashMap<String, ScalarValue>`, `HashMap<String, &ScalarValue>`, etc.
///
/// # Returns
/// - `Result<Arc<dyn PhysicalExpr>>`: The rewritten physical expression with columns replaced by literals.
pub fn replace_columns_with_literals<K, V>(
expr: Arc<dyn PhysicalExpr>,
replacements: &HashMap<K, V>,
) -> Result<Arc<dyn PhysicalExpr>>
where
K: Borrow<str> + Eq + Hash,
V: Borrow<ScalarValue>,
{
expr.transform_down(|expr| {
if let Some(column) = expr.as_any().downcast_ref::<Column>()
&& let Some(replacement_value) = replacements.get(column.name())
{
return Ok(Transformed::yes(expressions::lit(
replacement_value.borrow().clone(),
)));
}
Ok(Transformed::no(expr))
})
.data()
}

// Build a combined map for replacing column references with literal values.
// This includes:
// 1. Partition column values from the file path (e.g., region=us-west-2)
// 2. Constant columns detected from file statistics (where min == max)
//
// Although partition columns *are* constant columns, we don't want to rely on
// statistics for them being populated if we can use the partition values
// (which are guaranteed to be present).
//
// For example, given a partition column `region` and predicate
// `region IN ('us-east-1', 'eu-central-1')` with file path
// `/data/region=us-west-2/...`, the predicate is rewritten to
// `'us-west-2' IN ('us-east-1', 'eu-central-1')` which simplifies to FALSE.
//
// While partition column optimization is done during logical planning,
// there are cases where partition columns may appear in more complex
// predicates that cannot be simplified until we open the file (such as
// dynamic predicates).
let mut literal_columns: HashMap<String, ScalarValue> = self
.table_schema
.table_partition_cols()
.iter()
.zip(partitioned_file.partition_values.iter())
.map(|(field, value)| (field.name().clone(), value.clone()))
.collect();
// Add constant columns from file statistics.
// Note that if there are statistics for partition columns there will be overlap,
// but since we use a HashMap, we'll just overwrite the partition values with the
// constant values from statistics (which should be the same).
literal_columns.extend(constant_columns_from_stats(
partitioned_file.statistics.as_deref(),
&logical_file_schema,
));
// Apply literal replacements to projection and predicate
let mut projection = self.projection.clone();
let mut predicate = self.predicate.clone();
if !literal_columns.is_empty() {
projection = projection.try_map_exprs(|expr| {
replace_columns_with_literals(Arc::clone(&expr), &literal_columns)
})?;
predicate = predicate
.map(|p| replace_columns_with_literals(p, &literal_columns))
.transpose()?;
}

You'd have to contend with "what if this expression is evaluated without replacement". I'm not sure when / why that would happen, or if it's okay that the expression only works in the context of a scan. Maybe just returning null by default is reasonable behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the generic runtime binding for Physical Expressions not a good path forward? I have changed this some @adriangb last review and think it will provide a clean API once the follow up work is done.

It will eliminate the wrapper and create an API over any physical expression to have runtime properties. This feels like a good approach since it's easy to comprehend, no changes to the schema or injecting columns, and introduced what I think will be a useful property to physical expressions as cost based and adaptive query execution is adopted.

Cc: @gabotechs

Copy link
Contributor Author

@gene-bordegaray gene-bordegaray Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the approach leak the idea of partitioning into dynamic filters, rather its goal will be to make physical expression aware of partitioning which I think is feasible.

Maybe I am misunderstanding what is good practice in exposing properties to parts of the plan but it seems like a good approach to have expressions aware of runtime features

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a fleshed out PR that is my idea for the genreic runtime binder: #20549

this may make my vision more clear 👍

cc: @adriangb @gabotechs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be fine in investigating how other engines handle this? Looking at successful implementations of this in other engines might produce good arguments for a path forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate documentation Improvements or additions to documentation optimizer Optimizer rules physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

support dynamic filtering on partitioned data from file source

5 participants