Hide non-primary OneOf branches from logical tree rewrites#43
Draft
zhuqi-lucas wants to merge 1 commit intobranch-52from
Draft
Hide non-primary OneOf branches from logical tree rewrites#43zhuqi-lucas wants to merge 1 commit intobranch-52from
zhuqi-lucas wants to merge 1 commit intobranch-52from
Conversation
ViewMatcher creates a OneOf node holding the original plan (branches[0]) plus N MV-bound candidates. Each post-ViewMatcher logical optimizer pass walks every branch independently via rewrite_extension_inputs — N× redundant work that dominates LogicalPlanning time when N is large (e.g., the tickers endpoint with 5 candidates × 32 hash bins). Mirror the OneOfExec.children() trick at the logical layer: - OneOf::inputs() now returns only the primary branch. Subsequent DF tree rewriters walk a single subtree per OneOf, not all candidates. - Non-primary branches are kept in storage and exposed via a new public branches() accessor for atlas-side optimizers and the extension planner. - with_exprs_and_inputs preserves non-primary branches when the primary is rewritten. - ViewExploitationPlanner converts non-primary branches itself via PhysicalPlanner::create_physical_plan (DF only converted the primary via the hidden inputs() path). - If a post-ViewMatcher optimizer transforms the primary's schema in a way that breaks equivalence with non-primary branches, plan_extension falls back to the primary physical plan instead of erroring — query remains correct, foregoing MV acceleration for that one query. Benchmark (benches/oneof_logical_walk_benchmark.rs, no-op tree rewrite over Filter(OneOf(N branches))): branches before after delta 1 945ns 925ns ~0% 3 2146ns 1060ns -50% 5 3454ns 1180ns -66% 10 6594ns 1500ns -77% Linear N× growth eliminated; residual slope is plan.clone() copying the stored branches Vec, not the rewrite walk itself.
There was a problem hiding this comment.
Pull request overview
This PR reduces redundant logical optimizer work after view-matching by hiding non-primary OneOf branches from DataFusion’s logical tree rewrites, while still preserving and planning those branches for MV exploitation.
Changes:
OneOf::inputs()now exposes only the primary branch; addedOneOf::branches()to access all stored candidates.ViewExploitationPlanner::plan_extensionnow physically plans non-primary branches itself and falls back to the primary plan if branch schemas diverge after optimization.- Added tests for the new logical behavior and a Criterion benchmark to validate elimination of N× logical-rewrite scaling.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/rewrite/exploitation.rs |
Hides non-primary logical inputs, adds branches() accessor, updates extension planning to convert hidden branches + fallback logic, adds unit tests. |
Cargo.toml |
Registers the new benchmark target. |
benches/oneof_logical_walk_benchmark.rs |
Adds benchmark measuring logical rewrite walk cost vs OneOf branch count. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
428
to
433
| fn inputs(&self) -> Vec<&LogicalPlan> { | ||
| self.branches | ||
| .iter() | ||
| .sorted_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal)) | ||
| .collect_vec() | ||
| // SAFETY: `OneOf` is constructed with a non-empty `branches` vector; | ||
| // `OneOfExec::try_new` enforces this and our constructors take owned | ||
| // input that callers already validated. | ||
| vec![&self.branches[0]] | ||
| } |
Comment on lines
+345
to
+348
| // others. | ||
| let mut all_physical = Vec::with_capacity(branches.len()); | ||
| all_physical.push(Arc::clone(&physical_inputs[0])); | ||
| for branch in &branches[1..] { |
Comment on lines
461
to
+465
| ) -> Result<Self> { | ||
| let mut new_branches = self.branches.clone(); | ||
| if let Some(updated_primary) = inputs.into_iter().next() { | ||
| new_branches[0] = updated_primary; | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ViewMatcher creates a
OneOflogical node holding the original plan plus N MV-bound candidates. Each post-ViewMatcher logical optimizer pass walks every branch independently via DataFusion'srewrite_extension_inputs— N× redundant work that dominatesLogicalPlanningtime for queries with many candidates (e.g., the atlas tickers endpoint with 5 candidates × 32 hash bins).This change mirrors the
OneOfExec.children()trick at the logical layer:OneOf::inputs()now returns only the primary branch. Subsequent DF tree rewriters walk a single subtree perOneOf, not all candidates.branches()accessor for atlas-side optimizers and the extension planner.with_exprs_and_inputspreserves non-primary branches when the primary is rewritten.ViewExploitationPlannerconverts non-primary branches itself viaPhysicalPlanner::create_physical_plan(DF only converted the primary via the hiddeninputs()path).plan_extensionfalls back to the primary physical plan instead of erroring — query stays correct, foregoing MV acceleration for that one query.Benchmark
benches/oneof_logical_walk_benchmark.rs— no-op tree rewrite overFilter(OneOf(N branches)):Linear N× growth eliminated; residual slope is
plan.clone()copying the stored branchesVec, not the rewrite walk itself.Production motivation
atlas reference-server
LogicalPlanningslow_poll spikes to 100+ ms during storms on tickers queries; executor idle hits 60 s, queue depth 250. View-matched candidate count (5+ MV variants × 32 hash bins per MV) makesrewrite_extension_inputsthe dominant cost per logical optimizer pass.Correctness
For atlas's typical API queries (
SELECT ... FROM tickers WHERE ticker = X), no joins, no subqueries — schema-changing optimizations (OptimizeProjections) complete before ViewMatcher runs, so the schema-divergence path is unreachable in practice. The fallback inplan_extensionis defense in depth for future complex query patterns.Test plan
cargo testpasses (30 unit + integration tests)inputs(),branches()accessor, primary preservation, no-input rebuild, and observable schema divergence