Skip to content

Fuse multiple scalar-aggregate subqueries over the same source into a single scan #23213

Description

@nathanb9

Problem

When a query computes several uncorrelated scalar-aggregate subqueries over the same table, DataFusion scans that table once per subquery.

Proposed rewrite

When 2 or more such subqueries share one source, fuse them into a single scan + aggregate, pushing each subquery's predicate into a FILTER (WHERE ...) clause:

-- Before: two scans of t
SELECT (SELECT count(*) FROM t WHERE a < 10),
       (SELECT avg(x)   FROM t WHERE a >= 10);

-- After: one scan of t
SELECT count(*) FILTER (WHERE a < 10),
       avg(x)   FILTER (WHERE a >= 10)
FROM t;

The source filter becomes the OR of the branch predicates, and each scalar subquery is replaced by a reference to the merged aggregate.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions