Skip to content

Perf: Extend TopK aggregation optimization to general aggregates#20568

Closed
Dandandan wants to merge 14 commits intoapache:mainfrom
Dandandan:topk_aggr
Closed

Perf: Extend TopK aggregation optimization to general aggregates#20568
Dandandan wants to merge 14 commits intoapache:mainfrom
Dandandan:topk_aggr

Conversation

@Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Feb 26, 2026

Push limit into AggregateExec for queries like GROUP BY ... ORDER BY COUNT(*) DESC LIMIT K, not just MIN/MAX and DISTINCT. After aggregation completes, applies a partial sort + take to emit only the top-K rows, avoiding full materialization downstream.

Which issue does this PR close?

Rationale for this change

Performance 🚀

Details

What changes are included in this PR?

TopK-Aggregate fusion for general aggregates — Extends the existing TopK optimization (which only worked for MIN/MAX and DISTINCT) to general aggregates like COUNT, SUM, AVG, etc.

For queries like SELECT col, COUNT(*) AS c FROM t GROUP BY col ORDER BY c DESC LIMIT 10, instead of materializing all groups from the hash aggregation and then sorting, the optimizer pushes the limit into
AggregateExec. At emit time, it uses a partial sort (lexsort_to_indices) to extract only the top-K rows, avoiding a full sort.

Are these changes tested?

Yes, additional + existing tests

Are there any user-facing changes?

Small changes to

  • LimitOptions
  • AggregateExec::limit_options

Push limit into AggregateExec for queries like GROUP BY ... ORDER BY COUNT(*) DESC LIMIT K,
not just MIN/MAX and DISTINCT. After aggregation completes, applies a partial sort + take
to emit only the top-K rows, avoiding full materialization downstream.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels Feb 26, 2026
@Dandandan
Copy link
Contributor Author

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing topk_aggr (fe78e22) to bfc012e diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

Dandandan and others added 2 commits February 26, 2026 09:42
Serialize and deserialize the new sort_column_index field in LimitOptions
so that general aggregate TopK plans survive protobuf roundtrip.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added core Core DataFusion crate proto Related to proto crate labels Feb 26, 2026
@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and topk_aggr
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃   topk_aggr ┃    Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0 │  2173.72 ms │  2148.71 ms │ no change │
│ QQuery 1 │   915.02 ms │   934.92 ms │ no change │
│ QQuery 2 │  1750.45 ms │  1734.06 ms │ no change │
│ QQuery 3 │  1053.93 ms │  1051.80 ms │ no change │
│ QQuery 4 │  2174.17 ms │  2169.07 ms │ no change │
│ QQuery 5 │ 28099.27 ms │ 27850.73 ms │ no change │
│ QQuery 6 │  3825.59 ms │  3663.31 ms │ no change │
│ QQuery 7 │  2524.30 ms │  2470.65 ms │ no change │
└──────────┴─────────────┴─────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)        │ 42516.45ms │
│ Total Time (topk_aggr)   │ 42023.25ms │
│ Average Time (HEAD)      │  5314.56ms │
│ Average Time (topk_aggr) │  5252.91ms │
│ Queries Faster           │          0 │
│ Queries Slower           │          0 │
│ Queries with No Change   │          8 │
│ Queries with Failure     │          0 │
└──────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃   topk_aggr ┃          Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.54 ms │     2.56 ms │       no change │
│ QQuery 1  │    49.52 ms │    49.40 ms │       no change │
│ QQuery 2  │   134.48 ms │   132.22 ms │       no change │
│ QQuery 3  │   153.38 ms │   155.02 ms │       no change │
│ QQuery 4  │  1009.69 ms │  1022.37 ms │       no change │
│ QQuery 5  │  1242.52 ms │  1257.44 ms │       no change │
│ QQuery 6  │     6.12 ms │     6.05 ms │       no change │
│ QQuery 7  │    55.33 ms │    55.66 ms │       no change │
│ QQuery 8  │  1392.98 ms │  1366.87 ms │       no change │
│ QQuery 9  │  1756.80 ms │  1679.95 ms │       no change │
│ QQuery 10 │   351.80 ms │   348.34 ms │       no change │
│ QQuery 11 │   398.54 ms │   401.69 ms │       no change │
│ QQuery 12 │  1194.19 ms │   484.94 ms │   +2.46x faster │
│ QQuery 13 │  1839.62 ms │  1473.76 ms │   +1.25x faster │
│ QQuery 14 │  1224.15 ms │   537.49 ms │   +2.28x faster │
│ QQuery 15 │  1157.17 ms │   476.27 ms │   +2.43x faster │
│ QQuery 16 │  2422.76 ms │  1091.09 ms │   +2.22x faster │
│ QQuery 17 │  2406.16 ms │  2400.97 ms │       no change │
│ QQuery 18 │  4656.23 ms │  2111.88 ms │   +2.20x faster │
│ QQuery 19 │   118.91 ms │   120.10 ms │       no change │
│ QQuery 20 │  1862.06 ms │  1749.13 ms │   +1.06x faster │
│ QQuery 21 │  2121.06 ms │  2037.18 ms │       no change │
│ QQuery 22 │  3647.87 ms │  3450.82 ms │   +1.06x faster │
│ QQuery 23 │ 11912.28 ms │ 11553.36 ms │       no change │
│ QQuery 24 │   202.94 ms │   196.99 ms │       no change │
│ QQuery 25 │   458.84 ms │   437.61 ms │       no change │
│ QQuery 26 │   209.50 ms │   209.52 ms │       no change │
│ QQuery 27 │  2576.40 ms │  2464.48 ms │       no change │
│ QQuery 28 │ 23432.50 ms │ 22691.85 ms │       no change │
│ QQuery 29 │   961.34 ms │   982.83 ms │       no change │
│ QQuery 30 │  1268.75 ms │    64.23 ms │  +19.75x faster │
│ QQuery 31 │  1309.00 ms │    67.05 ms │  +19.52x faster │
│ QQuery 32 │  4132.87 ms │    24.04 ms │ +171.89x faster │
│ QQuery 33 │  5343.35 ms │  3001.15 ms │   +1.78x faster │
│ QQuery 34 │  5324.19 ms │  3078.61 ms │   +1.73x faster │
│ QQuery 35 │  1807.24 ms │   938.91 ms │   +1.92x faster │
│ QQuery 36 │   191.83 ms │   160.89 ms │   +1.19x faster │
│ QQuery 37 │    71.12 ms │    76.07 ms │    1.07x slower │
│ QQuery 38 │   115.31 ms │   110.79 ms │       no change │
│ QQuery 39 │   344.79 ms │   118.33 ms │   +2.91x faster │
│ QQuery 40 │    38.36 ms │    38.35 ms │       no change │
│ QQuery 41 │    33.51 ms │    33.15 ms │       no change │
│ QQuery 42 │    31.60 ms │    29.99 ms │   +1.05x faster │
└───────────┴─────────────┴─────────────┴─────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)        │ 88969.58ms │
│ Total Time (topk_aggr)   │ 68689.40ms │
│ Average Time (HEAD)      │  2069.06ms │
│ Average Time (topk_aggr) │  1597.43ms │
│ Queries Faster           │         17 │
│ Queries Slower           │          1 │
│ Queries with No Change   │         25 │
│ Queries with Failure     │          0 │
└──────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ topk_aggr ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 101.34 ms │ 100.91 ms │     no change │
│ QQuery 2  │  33.32 ms │  32.86 ms │     no change │
│ QQuery 3  │  41.98 ms │  39.97 ms │     no change │
│ QQuery 4  │  32.14 ms │  30.80 ms │     no change │
│ QQuery 5  │  92.74 ms │  90.55 ms │     no change │
│ QQuery 6  │  21.38 ms │  20.90 ms │     no change │
│ QQuery 7  │ 161.07 ms │ 153.48 ms │     no change │
│ QQuery 8  │  42.67 ms │  37.71 ms │ +1.13x faster │
│ QQuery 9  │  95.65 ms │  97.79 ms │     no change │
│ QQuery 10 │  66.11 ms │  66.35 ms │     no change │
│ QQuery 11 │  19.30 ms │  18.54 ms │     no change │
│ QQuery 12 │  51.01 ms │  51.87 ms │     no change │
│ QQuery 13 │  48.14 ms │  49.40 ms │     no change │
│ QQuery 14 │  14.92 ms │  14.82 ms │     no change │
│ QQuery 15 │  29.78 ms │  30.17 ms │     no change │
│ QQuery 16 │  28.30 ms │  28.69 ms │     no change │
│ QQuery 17 │ 143.74 ms │ 141.24 ms │     no change │
│ QQuery 18 │ 280.94 ms │ 280.87 ms │     no change │
│ QQuery 19 │  39.25 ms │  40.23 ms │     no change │
│ QQuery 20 │  56.92 ms │  54.50 ms │     no change │
│ QQuery 21 │ 198.35 ms │ 192.53 ms │     no change │
│ QQuery 22 │  22.52 ms │  22.56 ms │     no change │
└───────────┴───────────┴───────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)        │ 1621.58ms │
│ Total Time (topk_aggr)   │ 1596.73ms │
│ Average Time (HEAD)      │   73.71ms │
│ Average Time (topk_aggr) │   72.58ms │
│ Queries Faster           │         1 │
│ Queries Slower           │         0 │
│ Queries with No Change   │        21 │
│ Queries with Failure     │         0 │
└──────────────────────────┴───────────┘

Dandandan and others added 2 commits February 26, 2026 10:05
Extends the TopK-aggregate fusion optimization to support multi-column
ORDER BY clauses (e.g. ORDER BY COUNT(*) DESC, name ASC LIMIT K).
Previously only single-column sort was supported. Uses lexsort_to_indices
for efficient multi-column partial sort during the aggregate emit phase.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Dandandan
Copy link
Contributor Author

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing topk_aggr (895d362) to bfc012e diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and topk_aggr
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃   topk_aggr ┃          Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2347.12 ms │  2278.08 ms │       no change │
│ QQuery 1 │   891.45 ms │   867.87 ms │       no change │
│ QQuery 2 │  1759.35 ms │  1731.22 ms │       no change │
│ QQuery 3 │  1054.13 ms │  1064.51 ms │       no change │
│ QQuery 4 │  2201.16 ms │  2184.40 ms │       no change │
│ QQuery 5 │ 28072.71 ms │ 28467.29 ms │       no change │
│ QQuery 6 │  3809.09 ms │  3824.70 ms │       no change │
│ QQuery 7 │  2624.91 ms │    17.23 ms │ +152.37x faster │
└──────────┴─────────────┴─────────────┴─────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)        │ 42759.92ms │
│ Total Time (topk_aggr)   │ 40435.30ms │
│ Average Time (HEAD)      │  5344.99ms │
│ Average Time (topk_aggr) │  5054.41ms │
│ Queries Faster           │          1 │
│ Queries Slower           │          0 │
│ Queries with No Change   │          7 │
│ Queries with Failure     │          0 │
└──────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃   topk_aggr ┃          Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.53 ms │     2.61 ms │       no change │
│ QQuery 1  │    48.94 ms │    49.82 ms │       no change │
│ QQuery 2  │   132.03 ms │   133.43 ms │       no change │
│ QQuery 3  │   154.62 ms │   153.15 ms │       no change │
│ QQuery 4  │   969.83 ms │  1034.96 ms │    1.07x slower │
│ QQuery 5  │  1239.11 ms │  1263.62 ms │       no change │
│ QQuery 6  │     6.33 ms │     6.18 ms │       no change │
│ QQuery 7  │    54.79 ms │    53.86 ms │       no change │
│ QQuery 8  │  1336.06 ms │  1398.68 ms │       no change │
│ QQuery 9  │  1693.06 ms │  1689.13 ms │       no change │
│ QQuery 10 │   344.21 ms │   350.53 ms │       no change │
│ QQuery 11 │   397.23 ms │   400.18 ms │       no change │
│ QQuery 12 │  1177.25 ms │   480.11 ms │   +2.45x faster │
│ QQuery 13 │  1865.04 ms │  1491.60 ms │   +1.25x faster │
│ QQuery 14 │  1181.23 ms │   542.00 ms │   +2.18x faster │
│ QQuery 15 │  1119.69 ms │   477.54 ms │   +2.34x faster │
│ QQuery 16 │  2419.61 ms │  1073.54 ms │   +2.25x faster │
│ QQuery 17 │  2402.67 ms │  2444.78 ms │       no change │
│ QQuery 18 │  5461.84 ms │  2207.37 ms │   +2.47x faster │
│ QQuery 19 │   116.40 ms │   121.57 ms │       no change │
│ QQuery 20 │  1864.78 ms │  1796.78 ms │       no change │
│ QQuery 21 │  2114.19 ms │  2132.82 ms │       no change │
│ QQuery 22 │  3651.91 ms │  3668.16 ms │       no change │
│ QQuery 23 │ 17634.66 ms │ 11987.92 ms │   +1.47x faster │
│ QQuery 24 │   204.31 ms │   205.02 ms │       no change │
│ QQuery 25 │   444.88 ms │   456.13 ms │       no change │
│ QQuery 26 │   220.67 ms │   199.52 ms │   +1.11x faster │
│ QQuery 27 │  2595.35 ms │  2634.19 ms │       no change │
│ QQuery 28 │ 23590.09 ms │ 21404.84 ms │   +1.10x faster │
│ QQuery 29 │   985.13 ms │   945.61 ms │       no change │
│ QQuery 30 │  1243.66 ms │    69.32 ms │  +17.94x faster │
│ QQuery 31 │  1322.86 ms │    70.38 ms │  +18.79x faster │
│ QQuery 32 │  4836.73 ms │    25.07 ms │ +192.96x faster │
│ QQuery 33 │  5450.24 ms │  3255.83 ms │   +1.67x faster │
│ QQuery 34 │  5672.91 ms │  3143.61 ms │   +1.80x faster │
│ QQuery 35 │  1878.51 ms │   985.87 ms │   +1.91x faster │
│ QQuery 36 │   186.70 ms │   170.24 ms │   +1.10x faster │
│ QQuery 37 │    73.98 ms │    74.42 ms │       no change │
│ QQuery 38 │   114.88 ms │   115.18 ms │       no change │
│ QQuery 39 │   343.33 ms │   115.76 ms │   +2.97x faster │
│ QQuery 40 │    38.44 ms │    38.29 ms │       no change │
│ QQuery 41 │    33.37 ms │    35.82 ms │    1.07x slower │
│ QQuery 42 │    31.66 ms │    31.24 ms │       no change │
└───────────┴─────────────┴─────────────┴─────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)        │ 96655.73ms │
│ Total Time (topk_aggr)   │ 68936.68ms │
│ Average Time (HEAD)      │  2247.81ms │
│ Average Time (topk_aggr) │  1603.18ms │
│ Queries Faster           │         17 │
│ Queries Slower           │          2 │
│ Queries with No Change   │         24 │
│ Queries with Failure     │          0 │
└──────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ topk_aggr ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 101.72 ms │ 101.20 ms │     no change │
│ QQuery 2  │  33.71 ms │  33.08 ms │     no change │
│ QQuery 3  │  42.10 ms │  36.21 ms │ +1.16x faster │
│ QQuery 4  │  31.72 ms │  30.34 ms │     no change │
│ QQuery 5  │  90.38 ms │  89.69 ms │     no change │
│ QQuery 6  │  21.09 ms │  20.77 ms │     no change │
│ QQuery 7  │ 153.80 ms │ 151.75 ms │     no change │
│ QQuery 8  │  41.76 ms │  38.08 ms │ +1.10x faster │
│ QQuery 9  │ 102.39 ms │  99.37 ms │     no change │
│ QQuery 10 │  69.15 ms │  68.63 ms │     no change │
│ QQuery 11 │  18.79 ms │  18.03 ms │     no change │
│ QQuery 12 │  51.27 ms │  51.85 ms │     no change │
│ QQuery 13 │  47.54 ms │  47.78 ms │     no change │
│ QQuery 14 │  14.99 ms │  15.62 ms │     no change │
│ QQuery 15 │  29.85 ms │  30.09 ms │     no change │
│ QQuery 16 │  28.46 ms │  29.14 ms │     no change │
│ QQuery 17 │ 141.11 ms │ 142.89 ms │     no change │
│ QQuery 18 │ 276.40 ms │ 281.32 ms │     no change │
│ QQuery 19 │  39.16 ms │  39.21 ms │     no change │
│ QQuery 20 │  53.65 ms │  54.61 ms │     no change │
│ QQuery 21 │ 196.67 ms │ 189.69 ms │     no change │
│ QQuery 22 │  22.96 ms │  22.79 ms │     no change │
└───────────┴───────────┴───────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)        │ 1608.66ms │
│ Total Time (topk_aggr)   │ 1592.15ms │
│ Average Time (HEAD)      │   73.12ms │
│ Average Time (topk_aggr) │   72.37ms │
│ Queries Faster           │         2 │
│ Queries Slower           │         0 │
│ Queries with No Change   │        20 │
│ Queries with Failure     │         0 │
└──────────────────────────┴───────────┘

@Dandandan Dandandan marked this pull request as ready for review February 26, 2026 10:17
@Dandandan Dandandan added the api change Changes the API exposed to users of the crate label Feb 26, 2026
@Dandandan Dandandan changed the title perf: Extend TopK aggregation optimization to general aggregates Perf: Extend TopK aggregation optimization to general aggregates Feb 26, 2026
@Dandandan Dandandan added the performance Make DataFusion faster label Feb 26, 2026
@Dandandan
Copy link
Contributor Author

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing topk_aggr (2bf1d95) to e684994 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and topk_aggr
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃   topk_aggr ┃          Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2332.51 ms │  2284.08 ms │       no change │
│ QQuery 1 │   920.10 ms │   865.29 ms │   +1.06x faster │
│ QQuery 2 │  1765.41 ms │  1755.42 ms │       no change │
│ QQuery 3 │  1031.93 ms │  1047.29 ms │       no change │
│ QQuery 4 │  2194.59 ms │  2221.00 ms │       no change │
│ QQuery 5 │ 28266.02 ms │ 28176.23 ms │       no change │
│ QQuery 6 │  3840.06 ms │  3862.57 ms │       no change │
│ QQuery 7 │  2826.65 ms │    18.15 ms │ +155.72x faster │
└──────────┴─────────────┴─────────────┴─────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)        │ 43177.28ms │
│ Total Time (topk_aggr)   │ 40230.02ms │
│ Average Time (HEAD)      │  5397.16ms │
│ Average Time (topk_aggr) │  5028.75ms │
│ Queries Faster           │          2 │
│ Queries Slower           │          0 │
│ Queries with No Change   │          6 │
│ Queries with Failure     │          0 │
└──────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃   topk_aggr ┃          Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.56 ms │     2.60 ms │       no change │
│ QQuery 1  │    48.91 ms │    48.09 ms │       no change │
│ QQuery 2  │   133.40 ms │   133.18 ms │       no change │
│ QQuery 3  │   156.19 ms │   152.51 ms │       no change │
│ QQuery 4  │   977.89 ms │  1024.06 ms │       no change │
│ QQuery 5  │  1244.13 ms │  1285.68 ms │       no change │
│ QQuery 6  │     7.23 ms │     6.22 ms │   +1.16x faster │
│ QQuery 7  │    54.71 ms │    53.87 ms │       no change │
│ QQuery 8  │  1364.40 ms │  1412.10 ms │       no change │
│ QQuery 9  │  1713.00 ms │  1770.07 ms │       no change │
│ QQuery 10 │   342.50 ms │   353.14 ms │       no change │
│ QQuery 11 │   400.03 ms │   410.12 ms │       no change │
│ QQuery 12 │  1178.94 ms │   498.42 ms │   +2.37x faster │
│ QQuery 13 │  1884.62 ms │  1507.75 ms │   +1.25x faster │
│ QQuery 14 │  1219.71 ms │   538.83 ms │   +2.26x faster │
│ QQuery 15 │  1118.96 ms │   481.99 ms │   +2.32x faster │
│ QQuery 16 │  2429.55 ms │  1087.18 ms │   +2.23x faster │
│ QQuery 17 │  2409.90 ms │  2548.80 ms │    1.06x slower │
│ QQuery 18 │  5047.43 ms │  2187.08 ms │   +2.31x faster │
│ QQuery 19 │   120.47 ms │   121.48 ms │       no change │
│ QQuery 20 │  1887.27 ms │  1864.36 ms │       no change │
│ QQuery 21 │  2167.30 ms │  2143.47 ms │       no change │
│ QQuery 22 │  3752.95 ms │  3656.37 ms │       no change │
│ QQuery 23 │ 17815.06 ms │ 11995.66 ms │   +1.49x faster │
│ QQuery 24 │   216.31 ms │   210.71 ms │       no change │
│ QQuery 25 │   453.91 ms │   456.42 ms │       no change │
│ QQuery 26 │   212.84 ms │   210.30 ms │       no change │
│ QQuery 27 │  2711.99 ms │  2558.03 ms │   +1.06x faster │
│ QQuery 28 │ 24433.91 ms │ 24249.40 ms │       no change │
│ QQuery 29 │   985.19 ms │   940.36 ms │       no change │
│ QQuery 30 │  1251.95 ms │    66.89 ms │  +18.72x faster │
│ QQuery 31 │  1285.82 ms │    70.42 ms │  +18.26x faster │
│ QQuery 32 │  4171.86 ms │    24.54 ms │ +170.01x faster │
│ QQuery 33 │  5458.19 ms │  3072.01 ms │   +1.78x faster │
│ QQuery 34 │  5640.56 ms │  3188.55 ms │   +1.77x faster │
│ QQuery 35 │  1843.17 ms │   966.38 ms │   +1.91x faster │
│ QQuery 36 │   187.36 ms │   168.13 ms │   +1.11x faster │
│ QQuery 37 │    73.86 ms │    76.93 ms │       no change │
│ QQuery 38 │   113.29 ms │   113.08 ms │       no change │
│ QQuery 39 │   344.45 ms │   109.43 ms │   +3.15x faster │
│ QQuery 40 │    41.74 ms │    40.83 ms │       no change │
│ QQuery 41 │    32.86 ms │    33.05 ms │       no change │
│ QQuery 42 │    31.91 ms │    30.00 ms │   +1.06x faster │
└───────────┴─────────────┴─────────────┴─────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)        │ 96968.28ms │
│ Total Time (topk_aggr)   │ 71868.49ms │
│ Average Time (HEAD)      │  2255.08ms │
│ Average Time (topk_aggr) │  1671.36ms │
│ Queries Faster           │         18 │
│ Queries Slower           │          1 │
│ Queries with No Change   │         24 │
│ Queries with Failure     │          0 │
└──────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ topk_aggr ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ 102.49 ms │ 100.67 ms │    no change │
│ QQuery 2  │  32.72 ms │  31.92 ms │    no change │
│ QQuery 3  │  35.39 ms │  34.21 ms │    no change │
│ QQuery 4  │  30.22 ms │  30.30 ms │    no change │
│ QQuery 5  │  87.75 ms │  88.76 ms │    no change │
│ QQuery 6  │  20.57 ms │  20.68 ms │    no change │
│ QQuery 7  │ 151.14 ms │ 154.36 ms │    no change │
│ QQuery 8  │  36.73 ms │  40.56 ms │ 1.10x slower │
│ QQuery 9  │ 100.21 ms │ 105.40 ms │ 1.05x slower │
│ QQuery 10 │  66.73 ms │  67.35 ms │    no change │
│ QQuery 11 │  17.72 ms │  18.43 ms │    no change │
│ QQuery 12 │  50.58 ms │  51.69 ms │    no change │
│ QQuery 13 │  48.10 ms │  47.01 ms │    no change │
│ QQuery 14 │  14.92 ms │  14.72 ms │    no change │
│ QQuery 15 │  29.53 ms │  30.01 ms │    no change │
│ QQuery 16 │  28.25 ms │  28.67 ms │    no change │
│ QQuery 17 │ 140.83 ms │ 141.70 ms │    no change │
│ QQuery 18 │ 278.88 ms │ 278.90 ms │    no change │
│ QQuery 19 │  39.72 ms │  39.10 ms │    no change │
│ QQuery 20 │  56.63 ms │  55.69 ms │    no change │
│ QQuery 21 │ 187.66 ms │ 185.05 ms │    no change │
│ QQuery 22 │  22.99 ms │  23.03 ms │    no change │
└───────────┴───────────┴───────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)        │ 1579.75ms │
│ Total Time (topk_aggr)   │ 1588.21ms │
│ Average Time (HEAD)      │   71.81ms │
│ Average Time (topk_aggr) │   72.19ms │
│ Queries Faster           │         0 │
│ Queries Slower           │         2 │
│ Queries with No Change   │        20 │
│ Queries with Failure     │         0 │
└──────────────────────────┴───────────┘

@Dandandan
Copy link
Contributor Author

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing topk_aggr (dcf0198) to e684994 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

Two fixes:
1. Don't set group_values_soft_limit when topk_sort_columns is set -
   flushing early resets accumulator state, corrupting results (TPC-H Q13).
2. Don't declare output ordering for FinalPartitioned/SinglePartitioned -
   EnforceSorting replaces SortPreservingMergeExec with
   CoalescePartitionsExec, losing merge-sort (DISTINCT ON).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request extends the TopK aggregation optimization to general aggregates (COUNT, SUM, AVG, etc.), beyond the existing support for MIN/MAX and DISTINCT. The optimization pushes limit information into AggregateExec for queries like GROUP BY ... ORDER BY COUNT(*) DESC LIMIT K. Instead of materializing all aggregation groups and then sorting, the optimizer configures AggregateExec to apply a partial sort + take operation after aggregation completes, emitting only the top-K rows.

Changes:

  • Extended LimitOptions to support multi-column sort specifications via new topk_sort_columns field
  • Added TopKEmit configuration to GroupedHashAggregateStream that applies lexicographic partial sort after aggregation
  • Updated TopKAggregation optimizer to recognize general aggregate patterns and set appropriate limit options
  • Added protobuf support for serializing/deserializing topk_sort_columns
  • Updated test expectations for TPC-H queries to show the new topk optimization in action

Reviewed changes

Copilot reviewed 14 out of 16 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
datafusion/physical-plan/src/aggregates/mod.rs Extended LimitOptions with topk_sort_columns field, added output ordering declaration logic, updated display formatting
datafusion/physical-plan/src/aggregates/row_hash.rs Added TopKEmit struct and apply_topk method, integrated topk selection into aggregation emit flow, prevented early flush for general topk
datafusion/physical-optimizer/src/topk_aggregation.rs Refactored to support multi-column sort, added general aggregate topk path, implemented SortExec elimination when ordering is satisfied
datafusion/proto/proto/datafusion.proto Added AggLimitSortColumn message and topk_sort_columns field to AggLimit
datafusion/proto/src/physical_plan/mod.rs Added serialization/deserialization logic for topk_sort_columns
datafusion/proto/src/generated/prost.rs Generated protobuf Rust code for new messages
datafusion/proto/src/generated/pbjson.rs Generated JSON serialization code for new messages
datafusion/physical-optimizer/src/combine_partial_final_agg.rs Updated to properly clone limit_options
datafusion/sqllogictest/test_files/*.slt.part Updated plan expectations showing new topk optimization
datafusion/sqllogictest/test_files/aggregate.slt Added tests for general aggregate topk optimization
datafusion/core/tests/sql/explain_analyze.rs Updated test expectation with topk information

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and topk_aggr
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃   topk_aggr ┃        Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2314.20 ms │  2321.87 ms │     no change │
│ QQuery 1 │   959.47 ms │   857.68 ms │ +1.12x faster │
│ QQuery 2 │  1770.40 ms │  1747.59 ms │     no change │
│ QQuery 3 │  1039.39 ms │  1062.09 ms │     no change │
│ QQuery 4 │  2181.78 ms │  2216.96 ms │     no change │
│ QQuery 5 │ 28107.84 ms │ 28024.88 ms │     no change │
│ QQuery 6 │  3819.11 ms │  3838.15 ms │     no change │
│ QQuery 7 │  2625.61 ms │  3419.11 ms │  1.30x slower │
└──────────┴─────────────┴─────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)        │ 42817.79ms │
│ Total Time (topk_aggr)   │ 43488.33ms │
│ Average Time (HEAD)      │  5352.22ms │
│ Average Time (topk_aggr) │  5436.04ms │
│ Queries Faster           │          1 │
│ Queries Slower           │          1 │
│ Queries with No Change   │          6 │
│ Queries with Failure     │          0 │
└──────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃   topk_aggr ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.58 ms │     2.64 ms │     no change │
│ QQuery 1  │    48.25 ms │    48.24 ms │     no change │
│ QQuery 2  │   139.74 ms │   135.19 ms │     no change │
│ QQuery 3  │   155.84 ms │   157.47 ms │     no change │
│ QQuery 4  │   982.55 ms │  1048.45 ms │  1.07x slower │
│ QQuery 5  │  1253.72 ms │  1279.43 ms │     no change │
│ QQuery 6  │     6.08 ms │     6.30 ms │     no change │
│ QQuery 7  │    53.15 ms │    54.90 ms │     no change │
│ QQuery 8  │  1332.81 ms │  1411.68 ms │  1.06x slower │
│ QQuery 9  │  1758.65 ms │  1766.35 ms │     no change │
│ QQuery 10 │   346.36 ms │   357.24 ms │     no change │
│ QQuery 11 │   393.31 ms │   405.44 ms │     no change │
│ QQuery 12 │  1184.54 ms │  1248.19 ms │  1.05x slower │
│ QQuery 13 │  1903.86 ms │  1942.48 ms │     no change │
│ QQuery 14 │  1197.86 ms │  1226.83 ms │     no change │
│ QQuery 15 │  1128.77 ms │  1222.62 ms │  1.08x slower │
│ QQuery 16 │  2472.71 ms │  2553.59 ms │     no change │
│ QQuery 17 │  2408.43 ms │  2502.40 ms │     no change │
│ QQuery 18 │  5431.10 ms │  5157.88 ms │ +1.05x faster │
│ QQuery 19 │   116.93 ms │   122.95 ms │  1.05x slower │
│ QQuery 20 │  1887.87 ms │  1885.39 ms │     no change │
│ QQuery 21 │  2149.95 ms │  2147.31 ms │     no change │
│ QQuery 22 │  3721.84 ms │  3696.28 ms │     no change │
│ QQuery 23 │ 18511.67 ms │ 12052.49 ms │ +1.54x faster │
│ QQuery 24 │   224.48 ms │   201.62 ms │ +1.11x faster │
│ QQuery 25 │   462.60 ms │   457.76 ms │     no change │
│ QQuery 26 │   213.75 ms │   217.09 ms │     no change │
│ QQuery 27 │  2780.51 ms │  2636.43 ms │ +1.05x faster │
│ QQuery 28 │ 24854.29 ms │ 23078.37 ms │ +1.08x faster │
│ QQuery 29 │   956.56 ms │   992.08 ms │     no change │
│ QQuery 30 │  1245.24 ms │  1265.20 ms │     no change │
│ QQuery 31 │  1334.25 ms │  1326.17 ms │     no change │
│ QQuery 32 │  4473.10 ms │  4161.09 ms │ +1.07x faster │
│ QQuery 33 │  5724.05 ms │  5219.55 ms │ +1.10x faster │
│ QQuery 34 │  5741.95 ms │  5829.69 ms │     no change │
│ QQuery 35 │  1884.24 ms │  1875.66 ms │     no change │
│ QQuery 36 │   188.47 ms │   188.35 ms │     no change │
│ QQuery 37 │    74.01 ms │    73.21 ms │     no change │
│ QQuery 38 │   115.80 ms │   112.16 ms │     no change │
│ QQuery 39 │   359.29 ms │   343.68 ms │     no change │
│ QQuery 40 │    40.32 ms │    37.47 ms │ +1.08x faster │
│ QQuery 41 │    34.19 ms │    35.86 ms │     no change │
│ QQuery 42 │    31.18 ms │    31.40 ms │     no change │
└───────────┴─────────────┴─────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)        │ 99326.84ms │
│ Total Time (topk_aggr)   │ 90514.59ms │
│ Average Time (HEAD)      │  2309.93ms │
│ Average Time (topk_aggr) │  2104.99ms │
│ Queries Faster           │          8 │
│ Queries Slower           │          5 │
│ Queries with No Change   │         30 │
│ Queries with Failure     │          0 │
└──────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ topk_aggr ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │ 103.59 ms │ 101.03 ms │ no change │
│ QQuery 2  │  32.73 ms │  31.69 ms │ no change │
│ QQuery 3  │  40.15 ms │  40.74 ms │ no change │
│ QQuery 4  │  29.75 ms │  30.20 ms │ no change │
│ QQuery 5  │  90.97 ms │  90.51 ms │ no change │
│ QQuery 6  │  20.88 ms │  20.79 ms │ no change │
│ QQuery 7  │ 153.98 ms │ 156.70 ms │ no change │
│ QQuery 8  │  38.81 ms │  37.00 ms │ no change │
│ QQuery 9  │ 105.55 ms │ 100.77 ms │ no change │
│ QQuery 10 │  65.77 ms │  67.45 ms │ no change │
│ QQuery 11 │  19.07 ms │  18.98 ms │ no change │
│ QQuery 12 │  51.50 ms │  51.46 ms │ no change │
│ QQuery 13 │  48.67 ms │  50.82 ms │ no change │
│ QQuery 14 │  14.78 ms │  14.84 ms │ no change │
│ QQuery 15 │  30.04 ms │  30.54 ms │ no change │
│ QQuery 16 │  28.08 ms │  28.08 ms │ no change │
│ QQuery 17 │ 141.07 ms │ 141.91 ms │ no change │
│ QQuery 18 │ 282.56 ms │ 277.51 ms │ no change │
│ QQuery 19 │  39.02 ms │  39.28 ms │ no change │
│ QQuery 20 │  56.70 ms │  55.88 ms │ no change │
│ QQuery 21 │ 189.15 ms │ 188.07 ms │ no change │
│ QQuery 22 │  22.93 ms │  22.83 ms │ no change │
└───────────┴───────────┴───────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)        │ 1605.74ms │
│ Total Time (topk_aggr)   │ 1597.08ms │
│ Average Time (HEAD)      │   72.99ms │
│ Average Time (topk_aggr) │   72.59ms │
│ Queries Faster           │         0 │
│ Queries Slower           │         0 │
│ Queries with No Change   │        22 │
│ Queries with Failure     │         0 │
└──────────────────────────┴───────────┘

@Dandandan Dandandan marked this pull request as draft February 26, 2026 20:03
@Dandandan
Copy link
Contributor Author

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing topk_aggr (9780ceb) to e684994 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and topk_aggr
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃   topk_aggr ┃       Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0 │  2320.62 ms │  2256.55 ms │    no change │
│ QQuery 1 │   918.12 ms │   875.89 ms │    no change │
│ QQuery 2 │  1762.96 ms │  1757.51 ms │    no change │
│ QQuery 3 │  1044.05 ms │  1069.52 ms │    no change │
│ QQuery 4 │  2161.21 ms │  2201.77 ms │    no change │
│ QQuery 5 │ 28170.13 ms │ 28312.34 ms │    no change │
│ QQuery 6 │  3873.56 ms │  3715.91 ms │    no change │
│ QQuery 7 │  2559.49 ms │  2821.21 ms │ 1.10x slower │
└──────────┴─────────────┴─────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)        │ 42810.14ms │
│ Total Time (topk_aggr)   │ 43010.71ms │
│ Average Time (HEAD)      │  5351.27ms │
│ Average Time (topk_aggr) │  5376.34ms │
│ Queries Faster           │          0 │
│ Queries Slower           │          1 │
│ Queries with No Change   │          7 │
│ Queries with Failure     │          0 │
└──────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃   topk_aggr ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.58 ms │     2.62 ms │     no change │
│ QQuery 1  │    48.80 ms │    47.97 ms │     no change │
│ QQuery 2  │   129.68 ms │   133.01 ms │     no change │
│ QQuery 3  │   156.70 ms │   155.91 ms │     no change │
│ QQuery 4  │   979.14 ms │  1072.91 ms │  1.10x slower │
│ QQuery 5  │  1259.62 ms │  1294.78 ms │     no change │
│ QQuery 6  │     6.35 ms │     6.50 ms │     no change │
│ QQuery 7  │    54.69 ms │    54.74 ms │     no change │
│ QQuery 8  │  1325.15 ms │  1449.18 ms │  1.09x slower │
│ QQuery 9  │  1700.74 ms │  1831.94 ms │  1.08x slower │
│ QQuery 10 │   344.93 ms │   353.33 ms │     no change │
│ QQuery 11 │   395.62 ms │   400.92 ms │     no change │
│ QQuery 12 │  1179.69 ms │  1262.45 ms │  1.07x slower │
│ QQuery 13 │  1857.41 ms │  1957.69 ms │  1.05x slower │
│ QQuery 14 │  1203.11 ms │  1267.35 ms │  1.05x slower │
│ QQuery 15 │  1154.53 ms │  1242.56 ms │  1.08x slower │
│ QQuery 16 │  2420.00 ms │  2463.10 ms │     no change │
│ QQuery 17 │  2387.87 ms │  2458.58 ms │     no change │
│ QQuery 18 │  5417.87 ms │  4640.53 ms │ +1.17x faster │
│ QQuery 19 │   119.69 ms │   120.83 ms │     no change │
│ QQuery 20 │  1875.99 ms │  1737.85 ms │ +1.08x faster │
│ QQuery 21 │  2154.02 ms │  2030.50 ms │ +1.06x faster │
│ QQuery 22 │  4056.69 ms │  3468.93 ms │ +1.17x faster │
│ QQuery 23 │ 17948.06 ms │ 11827.16 ms │ +1.52x faster │
│ QQuery 24 │   216.32 ms │   213.32 ms │     no change │
│ QQuery 25 │   456.11 ms │   436.68 ms │     no change │
│ QQuery 26 │   215.37 ms │   197.29 ms │ +1.09x faster │
│ QQuery 27 │  2663.71 ms │  2537.86 ms │     no change │
│ QQuery 28 │ 24688.88 ms │ 23411.26 ms │ +1.05x faster │
│ QQuery 29 │   959.94 ms │   976.45 ms │     no change │
│ QQuery 30 │  1209.53 ms │  1268.72 ms │     no change │
│ QQuery 31 │  1321.52 ms │  1299.32 ms │     no change │
│ QQuery 32 │  4768.07 ms │  4414.00 ms │ +1.08x faster │
│ QQuery 33 │  5521.86 ms │  5054.23 ms │ +1.09x faster │
│ QQuery 34 │  5609.07 ms │  5490.30 ms │     no change │
│ QQuery 35 │  1844.68 ms │  1896.96 ms │     no change │
│ QQuery 36 │   191.87 ms │   187.18 ms │     no change │
│ QQuery 37 │    72.54 ms │    73.02 ms │     no change │
│ QQuery 38 │   114.20 ms │   114.41 ms │     no change │
│ QQuery 39 │   354.60 ms │   329.93 ms │ +1.07x faster │
│ QQuery 40 │    39.47 ms │    39.96 ms │     no change │
│ QQuery 41 │    34.11 ms │    35.55 ms │     no change │
│ QQuery 42 │    30.85 ms │    31.39 ms │     no change │
└───────────┴─────────────┴─────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)        │ 98491.65ms │
│ Total Time (topk_aggr)   │ 89289.18ms │
│ Average Time (HEAD)      │  2290.50ms │
│ Average Time (topk_aggr) │  2076.49ms │
│ Queries Faster           │         10 │
│ Queries Slower           │          7 │
│ Queries with No Change   │         26 │
│ Queries with Failure     │          0 │
└──────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ topk_aggr ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ 102.11 ms │ 102.93 ms │    no change │
│ QQuery 2  │  34.03 ms │  33.93 ms │    no change │
│ QQuery 3  │  38.74 ms │  37.48 ms │    no change │
│ QQuery 4  │  30.06 ms │  32.11 ms │ 1.07x slower │
│ QQuery 5  │  89.54 ms │  93.16 ms │    no change │
│ QQuery 6  │  20.67 ms │  21.03 ms │    no change │
│ QQuery 7  │ 153.57 ms │ 161.11 ms │    no change │
│ QQuery 8  │  40.09 ms │  41.45 ms │    no change │
│ QQuery 9  │ 101.18 ms │ 100.99 ms │    no change │
│ QQuery 10 │  70.10 ms │  67.72 ms │    no change │
│ QQuery 11 │  19.13 ms │  18.80 ms │    no change │
│ QQuery 12 │  52.32 ms │  52.15 ms │    no change │
│ QQuery 13 │  49.36 ms │  49.83 ms │    no change │
│ QQuery 14 │  14.80 ms │  14.99 ms │    no change │
│ QQuery 15 │  30.42 ms │  30.05 ms │    no change │
│ QQuery 16 │  29.04 ms │  29.22 ms │    no change │
│ QQuery 17 │ 138.38 ms │ 141.68 ms │    no change │
│ QQuery 18 │ 277.38 ms │ 281.63 ms │    no change │
│ QQuery 19 │  40.43 ms │  40.00 ms │    no change │
│ QQuery 20 │  56.26 ms │  55.76 ms │    no change │
│ QQuery 21 │ 190.63 ms │ 196.01 ms │    no change │
│ QQuery 22 │  22.61 ms │  23.05 ms │    no change │
└───────────┴───────────┴───────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)        │ 1600.85ms │
│ Total Time (topk_aggr)   │ 1625.08ms │
│ Average Time (HEAD)      │   72.77ms │
│ Average Time (topk_aggr) │   73.87ms │
│ Queries Faster           │         0 │
│ Queries Slower           │         1 │
│ Queries with No Change   │        21 │
│ Queries with Failure     │         0 │
└──────────────────────────┴───────────┘

@Dandandan Dandandan closed this Feb 27, 2026
@Dandandan
Copy link
Contributor Author

Let's not do it for now, seems not that impactful with the soft limit removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate core Core DataFusion crate optimizer Optimizer rules performance Make DataFusion faster physical-plan Changes to the physical-plan crate proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve Aggregate with Limit

3 participants