refactor: make file-statistics cache keys schema-aware by Phoenix500526 · Pull Request #23201 · apache/datafusion

Phoenix500526 · 2026-06-26T07:56:00Z

Which issue does this PR close?

Closes Make file-statistics cache keys schema-aware #23072.

Rationale for this change

File statistics are computed against a specific file_schema (their
column_statistics are positional, one per column), but the file-statistics
cache was keyed only by table and path. Reading the same path under a different
schema could therefore reuse statistics whose columns no longer line up,
panicking during statistics projection.

#22950 worked around this by bypassing the file-statistics cache entirely for
anonymous explicit-schema reads — correct, but it gave up cache reuse for them
(every such read recomputes statistics). #23072 asks to make the cache itself
schema-aware so those reads can reuse the cache safely instead of skipping it.

What changes are included in this PR?

Add SchemaFingerprint — the per-column (name, data_type, nullable) of a
file_schema, in order — and FileStatisticsCacheKey { table, path, schema },
and key the file-statistics cache on it (FileStatisticsCache is now
dyn Cache<FileStatisticsCacheKey, CachedFileMetadata>).
ListingTable::do_collect_statistics_and_ordering builds the key with the
file_schema fingerprint and uses the shared cache directly. The fix: isolate anonymous file statistics cache #22950
bypass (statistics_cache helper / schema_source-based skip) is removed:
different schemas now land in distinct entries (no stale cross-schema reuse),
while a repeated read of the same schema reuses its entry.
The fingerprint deliberately excludes field/schema metadata (it cannot
affect statistics, and including it would needlessly fragment the cache) and
partition columns (partition statistics are computed separately, outside this
cache).
Table-drop invalidation is unchanged: drop_table_entries matches on
CacheKey::table_ref(), which still returns the table, so all schema variants
for a dropped table are removed together.
The list-files cache continues to key on TableScopedPath.

Are these changes tested?

Yes.

Updated the fix: isolate anonymous file statistics cache #22950 regression test
(anonymous_parquet_stats_cache_with_explicit_wider_schema): the wider
explicit-schema read now lands in its own cache entry (2 entries, was 1 under
the bypass) with correct statistics and no panic, and a repeated read of that
schema is served from the cache (a cache hit, no new entry).
Added unit tests for SchemaFingerprint: it distinguishes nullability and
field order, and ignores field/schema metadata.
cargo test for the file_statistics integration module and the
datafusion-execution cache tests (including drop_table_entries) pass, along
with cargo fmt --all and cargo clippy --all-targets --all-features -- -D warnings for the touched crates.

Are there any user-facing changes?

No change to query results, physical plans, or the serialized (proto) wire
format; file statistics are computed exactly as before.

One public API change (please add the api change label): the
FileStatisticsCache type alias now uses FileStatisticsCacheKey instead of
TableScopedPath as its key. Code that constructed keys for this cache directly
must switch to FileStatisticsCacheKey. SchemaFingerprint and
FileStatisticsCacheKey are newly public; TableScopedPath remains (still used
by the list-files cache). cargo-semver-checks will flag the key-type change,
which is expected.

mkleen

Thank you for working on this. I left a few comments.

adriangb · 2026-06-26T09:54:50Z

I am going to run the wide_schema benchmarks here. I am afraid that any change touching schemas is susceptible to introduce O(num_columns^X) operations.

adriangb · 2026-06-26T09:56:52Z

run bechmark wide_schema

env:
  DATAFUSION_RUNTIME_FILE_STATISTICS_CACHE_LIMIT: 0

adriangb · 2026-06-26T09:57:00Z

run bechmark wide_schema

adriangb · 2026-06-26T10:01:04Z

run bechmark wide_schema

baseline:
  env:
    DATAFUSION_RUNTIME_FILE_STATISTICS_CACHE_LIMIT: 0

mkleen · 2026-06-26T10:05:07Z

run bechmark wide_schema

baseline:
  env:
    DATAFUSION_RUNTIME_FILE_STATISTICS_CACHE_LIMIT: 0

This is cool, i did not know this.

adriangb · 2026-06-26T10:22:08Z

run benchmark wide_schema

env:
  DATAFUSION_RUNTIME_FILE_STATISTICS_CACHE_LIMIT: 0

adriangb · 2026-06-26T10:23:52Z

run benchmark wide_schema

baseline:
  env:
    DATAFUSION_RUNTIME_FILE_STATISTICS_CACHE_LIMIT: 0

adriangb · 2026-06-26T10:24:09Z

run benchmark wide_schema

adriangb · 2026-06-26T10:24:47Z

run bechmark wide_schema
baseline:
  env:
    DATAFUSION_RUNTIME_FILE_STATISTICS_CACHE_LIMIT: 0
This is cool, i did not know this.

Yep the idea here is to run baseline w/o cache and this branch w/ cache. Orthogonal to this PR but I want to see how it looks like.

Unfortunately I had a typo in benchmark, sorry for the noise.

adriangbot · 2026-06-26T10:25:03Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4808656966-702-b4kn8 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing issue/23072 (a13e269) to ff677c4 (merge-base) diff using: wide_schema
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-26T10:27:31Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4808667999-703-92qlv 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing issue/23072 (a13e269) to ff677c4 (merge-base) diff using: wide_schema
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-26T10:27:38Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4808669794-704-whlqh 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing issue/23072 (a13e269) to ff677c4 (merge-base) diff using: wide_schema
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-26T11:12:29Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                     HEAD                                   issue_23072
-----                     ----                                   -----------
wide_schema/Q01_narrow    1.00     79.8±0.27ms        ? ?/sec    1.00     79.8±0.18ms        ? ?/sec
wide_schema/Q01_wide      1.00   1025.8±5.54ms        ? ?/sec    1.06   1087.0±3.82ms        ? ?/sec
wide_schema/Q02_narrow    1.00      5.9±0.09ms        ? ?/sec    1.05      6.2±0.05ms        ? ?/sec
wide_schema/Q02_wide      1.00    899.4±3.46ms        ? ?/sec    1.08    975.5±2.01ms        ? ?/sec
wide_schema/Q03_narrow    1.00     14.8±0.23ms        ? ?/sec    1.02     15.0±0.24ms        ? ?/sec
wide_schema/Q03_wide      1.00    912.9±6.88ms        ? ?/sec    1.07    977.0±3.51ms        ? ?/sec
wide_schema/Q04_narrow    1.00     37.2±0.24ms        ? ?/sec    1.01     37.6±0.20ms        ? ?/sec
wide_schema/Q04_wide      1.00    990.8±6.47ms        ? ?/sec    1.08   1068.2±6.72ms        ? ?/sec

Resource Usage

wide_schema — base (merge-base)

Metric	Value
Wall time	980.2s
Peak memory	1.2 GiB
Avg memory	98.4 MiB
CPU user	399.6s
CPU sys	58.7s
Peak spill	0 B

wide_schema — branch

Metric	Value
Wall time	975.2s
Peak memory	1.2 GiB
Avg memory	108.3 MiB
CPU user	386.8s
CPU sys	50.4s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-06-26T11:12:34Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                     HEAD                                   issue_23072
-----                     ----                                   -----------
wide_schema/Q01_narrow    1.00     80.3±0.44ms        ? ?/sec    1.00     80.5±0.49ms        ? ?/sec
wide_schema/Q01_wide      1.00   1016.6±5.73ms        ? ?/sec    1.07   1085.9±8.10ms        ? ?/sec
wide_schema/Q02_narrow    1.00      5.8±0.05ms        ? ?/sec    1.05      6.1±0.06ms        ? ?/sec
wide_schema/Q02_wide      1.00    893.2±5.93ms        ? ?/sec    1.10    984.1±7.66ms        ? ?/sec
wide_schema/Q03_narrow    1.00     14.5±0.26ms        ? ?/sec    1.02     14.8±0.23ms        ? ?/sec
wide_schema/Q03_wide      1.00    900.3±5.99ms        ? ?/sec    1.11   997.2±11.39ms        ? ?/sec
wide_schema/Q04_narrow    1.00     37.0±0.20ms        ? ?/sec    1.02     37.8±0.23ms        ? ?/sec
wide_schema/Q04_wide      1.00    983.9±3.82ms        ? ?/sec    1.08   1060.0±7.98ms        ? ?/sec

Resource Usage

wide_schema — base (merge-base)

Metric	Value
Wall time	980.2s
Peak memory	1.1 GiB
Avg memory	107.6 MiB
CPU user	398.4s
CPU sys	58.8s
Peak spill	0 B

wide_schema — branch

Metric	Value
Wall time	980.2s
Peak memory	1.2 GiB
Avg memory	104.2 MiB
CPU user	383.6s
CPU sys	51.3s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-06-26T11:13:00Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                     HEAD                                   issue_23072
-----                     ----                                   -----------
wide_schema/Q01_narrow    1.03     84.0±1.81ms        ? ?/sec    1.00     81.3±1.46ms        ? ?/sec
wide_schema/Q01_wide      1.00  1041.9±15.17ms        ? ?/sec    1.07  1112.7±14.96ms        ? ?/sec
wide_schema/Q02_narrow    1.00      6.2±0.11ms        ? ?/sec    1.03      6.4±0.14ms        ? ?/sec
wide_schema/Q02_wide      1.00   921.0±21.09ms        ? ?/sec    1.09  1007.2±20.47ms        ? ?/sec
wide_schema/Q03_narrow    1.00     14.8±0.35ms        ? ?/sec    1.05     15.6±0.17ms        ? ?/sec
wide_schema/Q03_wide      1.00   938.0±23.35ms        ? ?/sec    1.17  1097.5±26.47ms        ? ?/sec
wide_schema/Q04_narrow    1.00     37.8±0.62ms        ? ?/sec    1.04     39.1±0.90ms        ? ?/sec
wide_schema/Q04_wide      1.00    994.7±8.18ms        ? ?/sec    1.09  1080.9±23.55ms        ? ?/sec

Resource Usage

wide_schema — base (merge-base)

Metric	Value
Wall time	970.2s
Peak memory	1.1 GiB
Avg memory	97.1 MiB
CPU user	381.4s
CPU sys	52.1s
Peak spill	0 B

wide_schema — branch

Metric	Value
Wall time	995.2s
Peak memory	1.1 GiB
Avg memory	109.4 MiB
CPU user	384.8s
CPU sys	51.6s
Peak spill	0 B

File an issue against this benchmark runner

mkleen · 2026-06-26T15:21:15Z

Unfortunately we have regressions in the benchmarks:

group                     HEAD                                   issue_23072
-----                     ----                                   -----------
wide_schema/Q01_narrow    1.00     80.3±0.44ms        ? ?/sec    1.00     80.5±0.49ms        ? ?/sec
wide_schema/Q01_wide      1.00   1016.6±5.73ms        ? ?/sec    1.07   1085.9±8.10ms        ? ?/sec
wide_schema/Q02_narrow    1.00      5.8±0.05ms        ? ?/sec    1.05      6.1±0.06ms        ? ?/sec
wide_schema/Q02_wide      1.00    893.2±5.93ms        ? ?/sec    1.10    984.1±7.66ms        ? ?/sec
wide_schema/Q03_narrow    1.00     14.5±0.26ms        ? ?/sec    1.02     14.8±0.23ms        ? ?/sec
wide_schema/Q03_wide      1.00    900.3±5.99ms        ? ?/sec    1.11   997.2±11.39ms        ? ?/sec
wide_schema/Q04_narrow    1.00     37.0±0.20ms        ? ?/sec    1.02     37.8±0.23ms        ? ?/sec
wide_schema/Q04_wide      1.00    983.9±3.82ms        ? ?/sec    1.08   1060.0±7.98ms        ? ?/sec

mkleen · 2026-06-26T15:25:39Z

run benchmark wide_schema

baseline:
  env:
    DATAFUSION_RUNTIME_FILE_STATISTICS_CACHE_LIMIT: 0

mkleen · 2026-06-26T15:26:02Z

run benchmark wide_schema

adriangbot · 2026-06-26T15:28:37Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4810951539-710-d6x86 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing issue/23072 (9618f88) to ff677c4 (merge-base) diff using: wide_schema
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-26T15:28:47Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4810954479-711-k7mss 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing issue/23072 (9618f88) to ff677c4 (merge-base) diff using: wide_schema
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-26T16:11:26Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                     HEAD                                   issue_23072
-----                     ----                                   -----------
wide_schema/Q01_narrow    1.00     79.8±0.36ms        ? ?/sec    1.02     81.4±0.53ms        ? ?/sec
wide_schema/Q01_wide      1.00   1017.6±3.63ms        ? ?/sec    1.02   1042.3±4.84ms        ? ?/sec
wide_schema/Q02_narrow    1.00      6.2±0.10ms        ? ?/sec    1.00      6.2±0.08ms        ? ?/sec
wide_schema/Q02_wide      1.00    901.5±3.36ms        ? ?/sec    1.02    916.6±6.25ms        ? ?/sec
wide_schema/Q03_narrow    1.00     15.5±0.24ms        ? ?/sec    1.00     15.5±0.26ms        ? ?/sec
wide_schema/Q03_wide      1.00    912.1±5.79ms        ? ?/sec    1.02    933.7±4.82ms        ? ?/sec
wide_schema/Q04_narrow    1.00     37.1±0.12ms        ? ?/sec    1.00     37.1±0.13ms        ? ?/sec
wide_schema/Q04_wide      1.00   1008.7±5.74ms        ? ?/sec    1.02   1029.2±9.65ms        ? ?/sec

Resource Usage

wide_schema — base (merge-base)

Metric	Value
Wall time	640.1s
Peak memory	1.2 GiB
Avg memory	152.6 MiB
CPU user	382.9s
CPU sys	52.9s
Peak spill	0 B

wide_schema — branch

Metric	Value
Wall time	975.2s
Peak memory	1.1 GiB
Avg memory	104.0 MiB
CPU user	381.7s
CPU sys	52.0s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-06-26T16:11:42Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                     HEAD                                   issue_23072
-----                     ----                                   -----------
wide_schema/Q01_narrow    1.01     81.3±0.44ms        ? ?/sec    1.00     80.1±0.38ms        ? ?/sec
wide_schema/Q01_wide      1.00   1033.7±7.03ms        ? ?/sec    1.02   1054.8±6.46ms        ? ?/sec
wide_schema/Q02_narrow    1.01      6.0±0.07ms        ? ?/sec    1.00      6.0±0.08ms        ? ?/sec
wide_schema/Q02_wide      1.00    911.3±4.97ms        ? ?/sec    1.02    933.8±4.85ms        ? ?/sec
wide_schema/Q03_narrow    1.00     15.0±0.30ms        ? ?/sec    1.00     15.1±0.09ms        ? ?/sec
wide_schema/Q03_wide      1.00    923.9±5.68ms        ? ?/sec    1.02    942.4±5.85ms        ? ?/sec
wide_schema/Q04_narrow    1.01     38.0±0.23ms        ? ?/sec    1.00     37.5±0.28ms        ? ?/sec
wide_schema/Q04_wide      1.00   1004.1±5.02ms        ? ?/sec    1.01   1018.3±6.65ms        ? ?/sec

Resource Usage

wide_schema — base (merge-base)

Metric	Value
Wall time	645.1s
Peak memory	1.1 GiB
Avg memory	147.8 MiB
CPU user	379.2s
CPU sys	53.5s
Peak spill	0 B

wide_schema — branch

Metric	Value
Wall time	960.2s
Peak memory	1.1 GiB
Avg memory	100.0 MiB
CPU user	384.2s
CPU sys	52.7s
Peak spill	0 B

File an issue against this benchmark runner

File statistics are computed against a specific `file_schema`, but the file-statistics cache was keyed only by table and path. Reading the same path under a different schema could reuse statistics whose `column_statistics` no longer line up, panicking during statistics projection. apache#22950 worked around this by bypassing the cache entirely for anonymous explicit-schema reads, at the cost of losing cache reuse for them. Introduce a `SchemaFingerprint` (per-column name, type and nullability, derived from `file_schema`) and a `FileStatisticsCacheKey { table, path, schema }`, and key the file-statistics cache on it. Different schemas now get distinct entries (no stale cross-schema reuse) while a repeated read of the same schema reuses its entry, so the apache#22950 bypass is removed and anonymous explicit-schema reads cache safely again. - The fingerprint excludes field/schema metadata (cannot affect statistics) and partition columns (their statistics are computed separately). - Table-drop invalidation is unchanged: drop_table_entries matches on CacheKey::table_ref(), which still returns the table, so all schema variants for a table are removed together. - The list-files cache continues to key on TableScopedPath. Closes apache#23072. Signed-off-by: Jiawei Zhao <Phoenix500526@163.com>

…apSize Add a `DFHeapSize` impl for 3-tuples (mirroring the existing 2-tuple one) so `Vec<(String, DataType, bool)>` accounts for its heap automatically, letting `SchemaFingerprint::heap_size` delegate to it instead of computing the size by hand. Also update the `test_statistics_cache` unit test to key on `FileStatisticsCacheKey` so it matches the real file-statistics cache. Signed-off-by: Jiawei Zhao <Phoenix500526@163.com>

`do_collect_statistics_and_ordering` rebuilt the `SchemaFingerprint` for every file, deep-cloning all column names and types — O(files x schema width) of redundant work, since `file_schema` is constant for a table. Compute the fingerprint once in `ListingTable::try_new` and store it as `Arc<SchemaFingerprint>`; `FileStatisticsCacheKey.schema` now holds the `Arc`, so building a key per file is an O(1) refcount bump instead of a deep clone. `Arc`'s `Eq`/`Hash` compare the inner value, so cache keying remains by schema contents. Signed-off-by: Jiawei Zhao <Phoenix500526@163.com>

mkleen · 2026-06-27T06:28:41Z

run benchmark clickbench_partitioned

adriangbot · 2026-06-27T06:31:38Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4815639024-719-hqn46 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing issue/23072 (9618f88) to ff677c4 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

mkleen · 2026-06-27T06:34:42Z

Since it's an api-change, I think it makes sense to add an entry to the upgrade guide in https://github.com/apache/datafusion/blob/main/docs/source/library-user-guide/upgrading/55.0.0.md.

adriangbot · 2026-06-27T07:18:43Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and issue_23072
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                       HEAD ┃                                issue_23072 ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │               1.27 / 4.21 ±5.73 / 15.66 ms │               2.09 / 5.10 ±5.82 / 16.74 ms │  1.21x slower │
│ QQuery 1  │             12.83 / 13.19 ±0.20 / 13.46 ms │             13.60 / 14.08 ±0.29 / 14.37 ms │  1.07x slower │
│ QQuery 2  │             37.39 / 38.20 ±0.62 / 39.26 ms │             37.28 / 37.65 ±0.24 / 37.96 ms │     no change │
│ QQuery 3  │             31.26 / 32.27 ±0.89 / 33.32 ms │             32.03 / 32.76 ±0.53 / 33.65 ms │     no change │
│ QQuery 4  │      2396.78 / 2446.78 ±34.33 / 2501.06 ms │      2511.52 / 2557.86 ±45.96 / 2613.87 ms │     no change │
│ QQuery 5  │     2579.67 / 2716.83 ±108.26 / 2879.29 ms │      2677.81 / 2756.10 ±51.52 / 2816.00 ms │     no change │
│ QQuery 6  │                1.27 / 1.42 ±0.23 / 1.87 ms │                2.26 / 2.38 ±0.18 / 2.74 ms │  1.68x slower │
│ QQuery 7  │             14.33 / 14.66 ±0.31 / 15.23 ms │             15.15 / 15.35 ±0.18 / 15.58 ms │     no change │
│ QQuery 8  │      2868.43 / 2918.46 ±42.03 / 2984.91 ms │      3040.29 / 3134.83 ±73.56 / 3254.73 ms │  1.07x slower │
│ QQuery 9  │         524.57 / 553.89 ±26.30 / 588.86 ms │         552.10 / 578.26 ±26.04 / 622.81 ms │     no change │
│ QQuery 10 │           83.93 / 90.82 ±11.14 / 113.00 ms │            86.24 / 92.32 ±8.48 / 108.84 ms │     no change │
│ QQuery 11 │            97.95 / 99.32 ±1.22 / 101.54 ms │           98.04 / 100.04 ±1.34 / 102.25 ms │     no change │
│ QQuery 12 │      2594.65 / 2682.75 ±75.99 / 2793.64 ms │     2683.30 / 2973.24 ±244.86 / 3365.90 ms │  1.11x slower │
│ QQuery 13 │      1757.08 / 1893.89 ±81.54 / 1986.39 ms │     1819.91 / 1980.12 ±160.08 / 2273.46 ms │     no change │
│ QQuery 14 │         735.09 / 766.41 ±27.14 / 801.26 ms │         763.91 / 814.80 ±42.49 / 870.49 ms │  1.06x slower │
│ QQuery 15 │      2740.04 / 2823.13 ±57.52 / 2916.29 ms │      2859.97 / 2943.72 ±55.42 / 3025.79 ms │     no change │
│ QQuery 16 │     7103.71 / 7252.21 ±110.77 / 7385.29 ms │     7302.71 / 7488.49 ±134.84 / 7673.51 ms │     no change │
│ QQuery 17 │     4175.88 / 4439.28 ±199.03 / 4760.40 ms │     4379.11 / 4564.36 ±172.58 / 4791.38 ms │     no change │
│ QQuery 18 │  32949.27 / 33510.29 ±449.55 / 34078.78 ms │  33627.51 / 34463.96 ±507.99 / 35168.97 ms │     no change │
│ QQuery 19 │             29.03 / 30.37 ±1.51 / 33.25 ms │             30.17 / 33.41 ±3.71 / 40.27 ms │  1.10x slower │
│ QQuery 20 │         519.38 / 530.58 ±12.84 / 552.12 ms │          519.54 / 530.94 ±9.64 / 543.39 ms │     no change │
│ QQuery 21 │          520.74 / 526.77 ±5.19 / 536.11 ms │          531.18 / 540.01 ±6.56 / 550.65 ms │     no change │
│ QQuery 22 │         996.40 / 999.70 ±2.58 / 1003.89 ms │       1014.53 / 1020.94 ±3.94 / 1025.94 ms │     no change │
│ QQuery 23 │      3129.71 / 3170.77 ±44.12 / 3254.80 ms │      3196.79 / 3245.63 ±32.98 / 3299.78 ms │     no change │
│ QQuery 24 │             41.74 / 42.28 ±0.62 / 43.46 ms │           43.57 / 59.05 ±24.64 / 108.08 ms │  1.40x slower │
│ QQuery 25 │          112.16 / 113.18 ±0.93 / 114.77 ms │          115.25 / 116.85 ±2.04 / 120.56 ms │     no change │
│ QQuery 26 │             43.05 / 47.61 ±6.14 / 59.49 ms │             44.09 / 45.34 ±1.89 / 49.09 ms │     no change │
│ QQuery 27 │          674.40 / 680.94 ±3.65 / 685.43 ms │          685.56 / 692.80 ±6.96 / 704.64 ms │     no change │
│ QQuery 28 │      3796.72 / 3870.50 ±91.72 / 4049.17 ms │     3820.93 / 4001.39 ±191.44 / 4343.94 ms │     no change │
│ QQuery 29 │           41.42 / 74.51 ±44.29 / 159.69 ms │           42.34 / 62.78 ±40.28 / 143.35 ms │ +1.19x faster │
│ QQuery 30 │         732.09 / 743.54 ±10.57 / 760.43 ms │         749.68 / 772.33 ±21.47 / 806.91 ms │     no change │
│ QQuery 31 │      1066.94 / 1085.74 ±12.65 / 1097.73 ms │       1115.07 / 1123.55 ±7.19 / 1135.75 ms │     no change │
│ QQuery 32 │  42863.32 / 42973.05 ±129.72 / 43223.45 ms │  44836.73 / 45143.94 ±220.10 / 45462.45 ms │  1.05x slower │
│ QQuery 33 │ 39609.76 / 42185.00 ±1931.20 / 45570.71 ms │ 45208.73 / 46353.21 ±1851.26 / 50039.97 ms │  1.10x slower │
│ QQuery 34 │ 42296.12 / 45147.79 ±2975.64 / 50712.18 ms │ 43949.49 / 45575.42 ±1672.03 / 48402.63 ms │     no change │
│ QQuery 35 │      1277.61 / 1334.39 ±39.40 / 1393.17 ms │      1263.36 / 1295.74 ±26.56 / 1340.65 ms │     no change │
│ QQuery 36 │          189.99 / 195.15 ±4.15 / 202.72 ms │          178.12 / 189.18 ±6.57 / 198.10 ms │     no change │
│ QQuery 37 │             40.76 / 46.15 ±3.81 / 51.19 ms │            38.57 / 47.86 ±12.04 / 70.19 ms │     no change │
│ QQuery 38 │             42.87 / 45.93 ±2.22 / 48.54 ms │             45.61 / 47.78 ±3.53 / 54.83 ms │     no change │
│ QQuery 39 │          193.66 / 211.45 ±8.98 / 217.28 ms │          184.67 / 197.79 ±7.99 / 209.55 ms │ +1.07x faster │
│ QQuery 40 │             15.71 / 18.37 ±4.80 / 27.96 ms │             15.54 / 15.67 ±0.10 / 15.82 ms │ +1.17x faster │
│ QQuery 41 │             14.72 / 15.02 ±0.40 / 15.81 ms │             15.35 / 17.17 ±3.53 / 24.23 ms │  1.14x slower │
│ QQuery 42 │             14.33 / 14.56 ±0.18 / 14.83 ms │             14.57 / 14.87 ±0.41 / 15.68 ms │     no change │
└───────────┴────────────────────────────────────────────┴────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary          ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)          │ 206401.36ms │
│ Total Time (issue_23072)   │ 215699.09ms │
│ Average Time (HEAD)        │   4800.03ms │
│ Average Time (issue_23072) │   5016.26ms │
│ Queries Faster             │           3 │
│ Queries Slower             │          11 │
│ Queries with No Change     │          29 │
│ Queries with Failure       │           0 │
└────────────────────────────┴─────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	1035.2s
Peak memory	12.1 GiB
Avg memory	6.6 GiB
CPU user	10526.3s
CPU sys	497.5s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	1080.2s
Peak memory	12.5 GiB
Avg memory	6.5 GiB
CPU user	10925.6s
CPU sys	523.7s
Peak spill	0 B

File an issue against this benchmark runner

mkleen · 2026-06-27T07:22:38Z

run benchmark clickbench_partitioned

adriangbot · 2026-06-27T07:25:32Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4815807773-720-pdt48 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing issue/23072 (9618f88) to ff677c4 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-27T08:10:07Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and issue_23072
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                       HEAD ┃                                issue_23072 ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │               1.23 / 3.91 ±5.32 / 14.55 ms │               2.06 / 4.84 ±5.53 / 15.91 ms │  1.24x slower │
│ QQuery 1  │             12.46 / 12.70 ±0.20 / 13.05 ms │             13.84 / 13.97 ±0.13 / 14.21 ms │  1.10x slower │
│ QQuery 2  │             35.39 / 35.77 ±0.28 / 36.09 ms │             36.73 / 36.95 ±0.26 / 37.43 ms │     no change │
│ QQuery 3  │             30.25 / 31.28 ±1.09 / 32.98 ms │             31.15 / 31.41 ±0.19 / 31.68 ms │     no change │
│ QQuery 4  │      2321.80 / 2382.35 ±45.86 / 2443.98 ms │      2375.47 / 2416.65 ±25.83 / 2445.94 ms │     no change │
│ QQuery 5  │      2405.37 / 2510.16 ±88.33 / 2667.59 ms │      2456.23 / 2571.81 ±92.71 / 2706.94 ms │     no change │
│ QQuery 6  │                1.28 / 1.44 ±0.24 / 1.91 ms │                2.10 / 2.28 ±0.21 / 2.68 ms │  1.59x slower │
│ QQuery 7  │             13.55 / 13.73 ±0.12 / 13.89 ms │             15.04 / 15.13 ±0.07 / 15.22 ms │  1.10x slower │
│ QQuery 8  │      2898.72 / 2945.38 ±29.51 / 2984.70 ms │      2872.18 / 2924.64 ±47.76 / 2990.30 ms │     no change │
│ QQuery 9  │          477.33 / 486.63 ±6.30 / 494.91 ms │         469.56 / 505.59 ±33.43 / 564.13 ms │     no change │
│ QQuery 10 │           79.58 / 85.09 ±10.00 / 105.08 ms │             82.33 / 84.21 ±1.81 / 87.50 ms │     no change │
│ QQuery 11 │             92.91 / 93.99 ±0.97 / 95.35 ms │          94.36 / 108.86 ±24.16 / 157.09 ms │  1.16x slower │
│ QQuery 12 │      2484.33 / 2584.11 ±67.78 / 2691.02 ms │      2533.80 / 2592.58 ±48.37 / 2664.00 ms │     no change │
│ QQuery 13 │     1682.99 / 1896.59 ±145.53 / 2062.33 ms │     1810.25 / 1915.01 ±101.64 / 2106.05 ms │     no change │
│ QQuery 14 │         711.17 / 733.92 ±25.67 / 783.57 ms │         718.86 / 746.27 ±21.38 / 781.23 ms │     no change │
│ QQuery 15 │      2698.82 / 2774.11 ±72.62 / 2893.63 ms │      2638.15 / 2708.72 ±52.77 / 2771.11 ms │     no change │
│ QQuery 16 │     7005.19 / 7175.60 ±145.30 / 7431.25 ms │     6991.35 / 7154.71 ±134.41 / 7341.34 ms │     no change │
│ QQuery 17 │     4231.24 / 4362.66 ±105.91 / 4539.88 ms │     4189.45 / 4420.81 ±175.20 / 4648.30 ms │     no change │
│ QQuery 18 │  31716.39 / 32375.43 ±457.46 / 32814.30 ms │  32176.69 / 32822.00 ±354.19 / 33195.45 ms │     no change │
│ QQuery 19 │             27.93 / 29.35 ±1.80 / 32.86 ms │             29.05 / 30.86 ±1.69 / 33.72 ms │  1.05x slower │
│ QQuery 20 │         510.40 / 522.16 ±12.73 / 546.82 ms │          513.23 / 519.47 ±6.54 / 530.47 ms │     no change │
│ QQuery 21 │         506.71 / 523.01 ±10.04 / 532.79 ms │          512.52 / 522.52 ±5.09 / 526.44 ms │     no change │
│ QQuery 22 │          978.83 / 987.24 ±7.43 / 996.13 ms │          971.58 / 983.45 ±9.30 / 997.95 ms │     no change │
│ QQuery 23 │      3026.83 / 3054.41 ±16.09 / 3073.14 ms │      2999.82 / 3020.83 ±20.49 / 3045.84 ms │     no change │
│ QQuery 24 │             41.05 / 41.20 ±0.19 / 41.57 ms │             41.77 / 42.10 ±0.22 / 42.47 ms │     no change │
│ QQuery 25 │         110.49 / 120.39 ±13.50 / 146.29 ms │          110.49 / 114.47 ±4.24 / 120.72 ms │     no change │
│ QQuery 26 │             41.62 / 42.53 ±0.84 / 43.68 ms │             42.76 / 43.74 ±0.87 / 45.29 ms │     no change │
│ QQuery 27 │          673.33 / 682.88 ±9.75 / 700.33 ms │          660.70 / 671.60 ±6.16 / 679.32 ms │     no change │
│ QQuery 28 │     3744.53 / 3961.80 ±146.97 / 4088.09 ms │     3652.60 / 3791.29 ±168.07 / 4117.72 ms │     no change │
│ QQuery 29 │             41.27 / 47.53 ±6.84 / 57.42 ms │             40.67 / 41.27 ±0.49 / 42.10 ms │ +1.15x faster │
│ QQuery 30 │          706.57 / 716.27 ±8.06 / 730.86 ms │         703.71 / 719.35 ±18.33 / 745.39 ms │     no change │
│ QQuery 31 │      1028.11 / 1066.45 ±25.97 / 1107.54 ms │      1039.03 / 1059.61 ±16.72 / 1089.60 ms │     no change │
│ QQuery 32 │  41749.98 / 41923.06 ±137.19 / 42115.99 ms │  41933.86 / 42059.36 ±125.78 / 42284.54 ms │     no change │
│ QQuery 33 │  39816.33 / 41118.42 ±878.48 / 42505.38 ms │ 39357.48 / 42890.32 ±2127.56 / 45949.01 ms │     no change │
│ QQuery 34 │ 39436.50 / 41579.73 ±1727.79 / 44503.61 ms │ 39569.80 / 41278.39 ±1576.87 / 43739.73 ms │     no change │
│ QQuery 35 │      1231.58 / 1249.08 ±18.53 / 1284.93 ms │      1213.77 / 1242.10 ±20.94 / 1267.65 ms │     no change │
│ QQuery 36 │         163.03 / 176.19 ±14.58 / 203.20 ms │         150.77 / 181.22 ±23.77 / 208.57 ms │     no change │
│ QQuery 37 │             37.59 / 44.05 ±9.64 / 63.04 ms │             37.68 / 42.33 ±4.95 / 51.74 ms │     no change │
│ QQuery 38 │             42.01 / 43.00 ±0.94 / 44.14 ms │             44.02 / 47.80 ±4.18 / 53.92 ms │  1.11x slower │
│ QQuery 39 │          179.30 / 185.76 ±4.10 / 190.28 ms │          183.11 / 193.03 ±9.14 / 208.18 ms │     no change │
│ QQuery 40 │             14.11 / 17.81 ±5.76 / 29.27 ms │             15.19 / 15.73 ±0.46 / 16.54 ms │ +1.13x faster │
│ QQuery 41 │             13.36 / 16.74 ±6.37 / 29.47 ms │             14.32 / 15.79 ±2.06 / 19.82 ms │ +1.06x faster │
│ QQuery 42 │             13.17 / 13.44 ±0.22 / 13.83 ms │             13.89 / 14.94 ±1.76 / 18.44 ms │  1.11x slower │
└───────────┴────────────────────────────────────────────┴────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary          ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)          │ 198667.33ms │
│ Total Time (issue_23072)   │ 200618.03ms │
│ Average Time (HEAD)        │   4620.17ms │
│ Average Time (issue_23072) │   4665.54ms │
│ Queries Faster             │           3 │
│ Queries Slower             │           8 │
│ Queries with No Change     │          32 │
│ Queries with Failure       │           0 │
└────────────────────────────┴─────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	995.2s
Peak memory	12.2 GiB
Avg memory	6.5 GiB
CPU user	10131.9s
CPU sys	464.0s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	1005.2s
Peak memory	12.5 GiB
Avg memory	6.6 GiB
CPU user	10156.9s
CPU sys	463.8s
Peak spill	0 B

File an issue against this benchmark runner

mkleen · 2026-06-27T08:33:17Z

It looks like there are regressions in the clickbench partitioned benchmark. In particular Query 6 is reproducibly slower.

│ QQuery 6  │                1.28 / 1.44 ±0.24 / 1.91 ms │                2.10 / 2.28 ±0.21 / 2.68 ms │  1.59x slower │

mkleen · 2026-06-27T10:05:21Z

+            // fingerprint: reads of the same path under a different schema get
+            // their own entry rather than reusing incompatible column statistics.
+            // The fingerprint is precomputed once per table (see `try_new`).
+            schema: Arc::clone(&self.file_schema_fingerprint),


Precalculating the hash for the fingerprint could be a solution to fix the regression. Right now we calculate the hash for the fingerprint for each entry which is expensive.

Thanks for the comment. I've precalcuated hash for the fingerprint. It seems that only whitelisted users can trigger benchmark. Could you help trigger one?

Hashing a FileStatisticsCacheKey on every cache lookup previously digested the entire file schema (O(schema width)). Store a fixed-seed hash of the fingerprint columns, computed once in from_schema, and feed only that u64 into the map hasher. PartialEq still compares the columns exactly, so a hash collision can never make two different schemas share a cache entry. Signed-off-by: Jiawei Zhao <Phoenix500526@163.com>

Phoenix500526 · 2026-06-27T13:26:46Z

run benchmark wide_schema

baseline:
  env:
    DATAFUSION_RUNTIME_FILE_STATISTICS_CACHE_LIMIT: 0

adriangbot · 2026-06-27T13:26:47Z

Hi @Phoenix500526, thanks for the request (#23201 (comment)). Only whitelisted users can trigger benchmarks. Allowed users: Dandandan, Fokko, Jefffrey, Omega359, Rachelint, adriangb, alamb, asubiotto, brunal, buraksenn, cetra3, codephage2020, coderfender, comphead, erenavsarogullari, etseidl, friendlymatthew, gabotechs, geoffreyclaude, grtlr, haohuaijin, jonathanc-n, kevinjqliu, klion26, kosiew, kumarUjjawal, kunalsinghdadhwal, liamzwbao, mbutrovich, mkleen, mzabaluev, neilconway, rluvaton, sdf-jkl, timsaucer, xudong963, zhuqi-lucas.

File an issue against this benchmark runner

mkleen · 2026-06-27T13:39:51Z

run benchmark clickbench_partitioned

adriangbot · 2026-06-27T13:41:14Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4817989936-722-ldxrq 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing issue/23072 (a767586) to d58e0c6 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-27T14:15:24Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and issue_23072
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                       HEAD ┃                                issue_23072 ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │               1.21 / 4.02 ±5.51 / 15.04 ms │               1.24 / 3.99 ±5.44 / 14.87 ms │     no change │
│ QQuery 1  │             13.26 / 13.44 ±0.13 / 13.60 ms │             13.16 / 13.41 ±0.15 / 13.59 ms │     no change │
│ QQuery 2  │             36.14 / 36.40 ±0.19 / 36.65 ms │             36.05 / 36.30 ±0.19 / 36.64 ms │     no change │
│ QQuery 3  │             30.79 / 31.45 ±0.70 / 32.77 ms │             30.69 / 31.25 ±0.65 / 32.51 ms │     no change │
│ QQuery 4  │      1713.06 / 1748.11 ±37.79 / 1799.01 ms │      1669.74 / 1749.10 ±55.76 / 1820.04 ms │     no change │
│ QQuery 5  │      1655.38 / 1752.29 ±81.29 / 1893.62 ms │     1635.63 / 1771.16 ±108.55 / 1954.22 ms │     no change │
│ QQuery 6  │                1.31 / 1.47 ±0.24 / 1.95 ms │                1.29 / 1.44 ±0.24 / 1.92 ms │     no change │
│ QQuery 7  │             14.83 / 14.96 ±0.12 / 15.17 ms │             14.58 / 14.69 ±0.09 / 14.79 ms │     no change │
│ QQuery 8  │      1952.39 / 2106.57 ±86.64 / 2204.21 ms │      1954.97 / 2090.42 ±92.39 / 2239.01 ms │     no change │
│ QQuery 9  │         474.26 / 490.31 ±15.09 / 513.09 ms │         480.49 / 508.52 ±16.79 / 531.28 ms │     no change │
│ QQuery 10 │             76.69 / 77.74 ±0.59 / 78.51 ms │             77.98 / 78.66 ±0.50 / 79.20 ms │     no change │
│ QQuery 11 │             87.87 / 90.90 ±3.85 / 98.50 ms │          89.37 / 111.46 ±37.72 / 186.77 ms │  1.23x slower │
│ QQuery 12 │      1651.47 / 1793.69 ±84.26 / 1876.07 ms │     1648.31 / 1835.06 ±130.28 / 1959.30 ms │     no change │
│ QQuery 13 │        475.60 / 630.23 ±142.13 / 875.61 ms │        461.54 / 664.70 ±167.50 / 849.37 ms │  1.05x slower │
│ QQuery 14 │         538.59 / 558.95 ±16.20 / 579.95 ms │         547.71 / 559.45 ±12.99 / 582.96 ms │     no change │
│ QQuery 15 │      1938.10 / 2015.97 ±46.42 / 2063.88 ms │      1917.02 / 1971.30 ±30.30 / 2004.56 ms │     no change │
│ QQuery 16 │     4265.04 / 4373.62 ±105.08 / 4502.63 ms │     3993.78 / 4259.63 ±178.39 / 4536.64 ms │     no change │
│ QQuery 17 │     4227.55 / 4465.40 ±192.18 / 4788.48 ms │     4194.17 / 4380.55 ±127.20 / 4541.17 ms │     no change │
│ QQuery 18 │  18037.75 / 18449.77 ±423.09 / 19035.43 ms │  17687.91 / 18461.00 ±482.94 / 19170.43 ms │     no change │
│ QQuery 19 │             28.78 / 35.39 ±9.26 / 53.72 ms │             28.03 / 28.88 ±0.82 / 30.28 ms │ +1.23x faster │
│ QQuery 20 │          510.09 / 519.72 ±8.85 / 535.95 ms │          514.94 / 520.13 ±4.33 / 527.17 ms │     no change │
│ QQuery 21 │          518.16 / 524.26 ±3.84 / 528.50 ms │         515.89 / 526.89 ±13.45 / 552.89 ms │     no change │
│ QQuery 22 │       985.52 / 1009.99 ±19.19 / 1039.36 ms │       975.21 / 1015.01 ±21.20 / 1037.17 ms │     no change │
│ QQuery 23 │      3051.84 / 3106.93 ±43.61 / 3163.82 ms │      3062.57 / 3107.56 ±46.75 / 3196.77 ms │     no change │
│ QQuery 24 │           41.30 / 60.38 ±22.73 / 103.63 ms │             41.30 / 44.34 ±5.10 / 54.47 ms │ +1.36x faster │
│ QQuery 25 │          110.88 / 111.83 ±0.82 / 113.24 ms │          112.05 / 114.59 ±2.42 / 119.05 ms │     no change │
│ QQuery 26 │             41.76 / 43.21 ±1.99 / 47.14 ms │             41.75 / 43.78 ±2.57 / 48.01 ms │     no change │
│ QQuery 27 │          663.17 / 675.06 ±6.50 / 682.76 ms │          671.21 / 675.06 ±2.72 / 678.54 ms │     no change │
│ QQuery 28 │     3465.46 / 3683.42 ±218.83 / 3986.37 ms │     3539.82 / 3768.05 ±146.80 / 3958.10 ms │     no change │
│ QQuery 29 │            40.52 / 53.27 ±19.80 / 91.65 ms │           40.62 / 61.09 ±33.97 / 128.15 ms │  1.15x slower │
│ QQuery 30 │         560.36 / 580.74 ±23.01 / 622.42 ms │         576.63 / 613.92 ±25.44 / 648.50 ms │  1.06x slower │
│ QQuery 31 │          309.46 / 316.17 ±4.89 / 323.02 ms │          300.34 / 310.42 ±6.12 / 319.60 ms │     no change │
│ QQuery 32 │        931.78 / 981.57 ±35.52 / 1022.67 ms │       969.52 / 1025.37 ±42.11 / 1064.17 ms │     no change │
│ QQuery 33 │ 27057.73 / 29042.70 ±1373.79 / 30626.86 ms │ 26189.33 / 27788.74 ±1211.16 / 29856.48 ms │     no change │
│ QQuery 34 │ 27351.84 / 29752.94 ±1814.65 / 31905.04 ms │  27295.07 / 27883.56 ±349.03 / 28308.45 ms │ +1.07x faster │
│ QQuery 35 │      985.27 / 1104.48 ±137.47 / 1364.30 ms │     1103.69 / 1188.79 ±127.72 / 1439.75 ms │  1.08x slower │
│ QQuery 36 │          160.15 / 168.59 ±4.79 / 174.21 ms │          170.86 / 173.74 ±2.60 / 176.90 ms │     no change │
│ QQuery 37 │            36.71 / 49.70 ±24.94 / 99.58 ms │           37.51 / 57.33 ±32.57 / 122.14 ms │  1.15x slower │
│ QQuery 38 │             42.53 / 45.67 ±1.87 / 47.62 ms │             41.42 / 43.78 ±1.58 / 45.44 ms │     no change │
│ QQuery 39 │         183.22 / 200.33 ±20.75 / 241.11 ms │          173.70 / 181.10 ±5.49 / 188.61 ms │ +1.11x faster │
│ QQuery 40 │             14.60 / 15.42 ±0.72 / 16.48 ms │             14.74 / 15.78 ±0.92 / 17.13 ms │     no change │
│ QQuery 41 │             13.89 / 14.20 ±0.37 / 14.90 ms │             13.74 / 13.98 ±0.22 / 14.40 ms │     no change │
│ QQuery 42 │             13.19 / 13.44 ±0.14 / 13.60 ms │             13.17 / 13.43 ±0.19 / 13.64 ms │     no change │
└───────────┴────────────────────────────────────────────┴────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary          ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)          │ 110764.74ms │
│ Total Time (issue_23072)   │ 107757.45ms │
│ Average Time (HEAD)        │   2575.92ms │
│ Average Time (issue_23072) │   2505.99ms │
│ Queries Faster             │           4 │
│ Queries Slower             │           6 │
│ Queries with No Change     │          33 │
│ Queries with Failure       │           0 │
└────────────────────────────┴─────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	555.1s
Peak memory	11.8 GiB
Avg memory	6.5 GiB
CPU user	4868.3s
CPU sys	326.8s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	540.1s
Peak memory	11.9 GiB
Avg memory	6.5 GiB
CPU user	4904.6s
CPU sys	325.4s
Peak spill	0 B

File an issue against this benchmark runner

mkleen · 2026-06-27T14:16:27Z

run benchmark clickbench_partitioned

mkleen · 2026-06-27T14:16:47Z

run benchmark wide_schema

baseline:
  env:
    DATAFUSION_RUNTIME_FILE_STATISTICS_CACHE_LIMIT: 0

adriangbot · 2026-06-27T14:17:35Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4818323559-723-tm7xw 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing issue/23072 (a767586) to d58e0c6 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-27T14:18:46Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4818326429-724-hr9kq 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing issue/23072 (a767586) to d58e0c6 (merge-base) diff using: wide_schema
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-27T14:47:27Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and issue_23072
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                       HEAD ┃                               issue_23072 ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │               1.19 / 3.96 ±5.43 / 14.81 ms │              1.26 / 4.13 ±5.61 / 15.35 ms │     no change │
│ QQuery 1  │             12.70 / 13.07 ±0.19 / 13.23 ms │            12.60 / 12.84 ±0.15 / 13.03 ms │     no change │
│ QQuery 2  │             35.70 / 36.00 ±0.23 / 36.32 ms │            35.95 / 36.22 ±0.24 / 36.62 ms │     no change │
│ QQuery 3  │             30.64 / 31.21 ±0.66 / 32.49 ms │            30.62 / 30.97 ±0.40 / 31.74 ms │     no change │
│ QQuery 4  │      1723.05 / 1769.76 ±54.07 / 1874.25 ms │     1658.60 / 1718.31 ±48.48 / 1790.59 ms │     no change │
│ QQuery 5  │     1610.36 / 1784.19 ±130.80 / 2000.40 ms │     1725.56 / 1884.15 ±99.56 / 2025.24 ms │  1.06x slower │
│ QQuery 6  │                1.23 / 1.40 ±0.24 / 1.86 ms │               1.27 / 1.42 ±0.24 / 1.90 ms │     no change │
│ QQuery 7  │             13.73 / 14.07 ±0.26 / 14.50 ms │            13.93 / 13.96 ±0.02 / 13.99 ms │     no change │
│ QQuery 8  │      2094.74 / 2156.23 ±31.47 / 2182.63 ms │     1915.68 / 2035.57 ±65.72 / 2103.57 ms │ +1.06x faster │
│ QQuery 9  │         476.16 / 499.44 ±20.23 / 531.57 ms │        482.64 / 517.78 ±23.15 / 548.11 ms │     no change │
│ QQuery 10 │             75.45 / 79.13 ±3.09 / 83.89 ms │          79.04 / 99.08 ±34.89 / 168.79 ms │  1.25x slower │
│ QQuery 11 │             86.55 / 89.01 ±1.60 / 90.78 ms │            90.76 / 93.20 ±1.93 / 95.32 ms │     no change │
│ QQuery 12 │     1644.75 / 1863.60 ±160.91 / 2100.36 ms │    1643.40 / 1804.01 ±138.32 / 2052.44 ms │     no change │
│ QQuery 13 │        464.22 / 637.70 ±124.33 / 852.83 ms │        627.58 / 659.55 ±27.41 / 690.79 ms │     no change │
│ QQuery 14 │          533.73 / 550.37 ±9.57 / 559.76 ms │        531.28 / 558.96 ±18.00 / 587.15 ms │     no change │
│ QQuery 15 │      1883.19 / 1972.22 ±51.93 / 2044.40 ms │     1932.60 / 2002.46 ±74.92 / 2137.80 ms │     no change │
│ QQuery 16 │     4181.86 / 4375.49 ±110.96 / 4515.96 ms │    4173.89 / 4409.59 ±208.11 / 4750.03 ms │     no change │
│ QQuery 17 │     4196.87 / 4413.93 ±137.96 / 4596.62 ms │    4273.16 / 4385.26 ±119.00 / 4604.14 ms │     no change │
│ QQuery 18 │  17773.79 / 18331.92 ±364.43 / 18905.62 ms │ 17753.89 / 18348.25 ±444.69 / 19093.57 ms │     no change │
│ QQuery 19 │             28.14 / 30.48 ±2.17 / 34.11 ms │            28.77 / 29.16 ±0.61 / 30.38 ms │     no change │
│ QQuery 20 │         518.86 / 527.21 ±10.27 / 546.78 ms │        517.59 / 543.22 ±43.47 / 629.88 ms │     no change │
│ QQuery 21 │          514.19 / 521.49 ±4.24 / 525.06 ms │         515.01 / 522.55 ±5.10 / 529.03 ms │     no change │
│ QQuery 22 │      1000.68 / 1022.29 ±14.98 / 1046.99 ms │        982.28 / 993.22 ±7.23 / 1000.28 ms │     no change │
│ QQuery 23 │      3041.81 / 3087.82 ±44.25 / 3170.73 ms │     3058.36 / 3116.24 ±37.16 / 3151.32 ms │     no change │
│ QQuery 24 │            42.95 / 55.32 ±14.87 / 82.96 ms │           43.06 / 55.77 ±12.94 / 72.47 ms │     no change │
│ QQuery 25 │          114.17 / 115.36 ±1.03 / 116.68 ms │         113.96 / 115.20 ±0.86 / 116.20 ms │     no change │
│ QQuery 26 │             42.70 / 44.14 ±1.93 / 47.92 ms │            43.68 / 45.74 ±2.32 / 49.01 ms │     no change │
│ QQuery 27 │          669.62 / 677.41 ±5.40 / 685.56 ms │        685.30 / 698.02 ±12.50 / 720.70 ms │     no change │
│ QQuery 28 │     3382.32 / 3681.90 ±258.18 / 3998.63 ms │     3532.26 / 3662.50 ±91.86 / 3806.84 ms │     no change │
│ QQuery 29 │             40.31 / 41.35 ±1.58 / 44.49 ms │            40.26 / 42.20 ±2.84 / 47.78 ms │     no change │
│ QQuery 30 │         555.91 / 572.68 ±11.58 / 591.23 ms │        558.19 / 577.16 ±14.57 / 599.23 ms │     no change │
│ QQuery 31 │         303.28 / 317.56 ±10.24 / 326.83 ms │         293.28 / 305.77 ±8.84 / 317.74 ms │     no change │
│ QQuery 32 │       951.69 / 1026.17 ±46.81 / 1090.00 ms │     1012.26 / 1041.62 ±30.09 / 1094.34 ms │     no change │
│ QQuery 33 │ 26933.56 / 29239.94 ±2140.91 / 32883.42 ms │ 27092.37 / 28407.79 ±776.99 / 29338.29 ms │     no change │
│ QQuery 34 │  26890.68 / 28453.87 ±948.94 / 29728.19 ms │ 28303.80 / 29368.10 ±732.73 / 30341.98 ms │     no change │
│ QQuery 35 │      994.33 / 1188.84 ±178.54 / 1467.40 ms │      990.29 / 1060.95 ±36.97 / 1094.63 ms │ +1.12x faster │
│ QQuery 36 │          172.41 / 174.45 ±2.44 / 178.37 ms │        154.89 / 167.03 ±11.57 / 188.05 ms │     no change │
│ QQuery 37 │           38.74 / 60.94 ±38.56 / 137.91 ms │           37.55 / 48.54 ±17.72 / 83.75 ms │ +1.26x faster │
│ QQuery 38 │             44.30 / 46.62 ±1.34 / 47.98 ms │            40.48 / 43.64 ±1.85 / 45.80 ms │ +1.07x faster │
│ QQuery 39 │         198.52 / 213.46 ±10.88 / 225.19 ms │        182.72 / 197.23 ±18.06 / 232.81 ms │ +1.08x faster │
│ QQuery 40 │             15.10 / 15.78 ±0.44 / 16.44 ms │            14.59 / 16.19 ±2.84 / 21.86 ms │     no change │
│ QQuery 41 │             14.69 / 14.99 ±0.23 / 15.35 ms │           14.01 / 20.30 ±12.32 / 44.93 ms │  1.35x slower │
│ QQuery 42 │             14.23 / 14.38 ±0.11 / 14.55 ms │            13.39 / 13.69 ±0.23 / 14.05 ms │     no change │
└───────────┴────────────────────────────────────────────┴───────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary          ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)          │ 109767.14ms │
│ Total Time (issue_23072)   │ 109707.57ms │
│ Average Time (HEAD)        │   2552.72ms │
│ Average Time (issue_23072) │   2551.34ms │
│ Queries Faster             │           5 │
│ Queries Slower             │           3 │
│ Queries with No Change     │          35 │
│ Queries with Failure       │           0 │
└────────────────────────────┴─────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	550.1s
Peak memory	12.0 GiB
Avg memory	6.5 GiB
CPU user	4880.8s
CPU sys	337.1s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	550.1s
Peak memory	12.1 GiB
Avg memory	6.5 GiB
CPU user	4888.1s
CPU sys	331.1s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-06-27T14:59:37Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                     HEAD                                   issue_23072
-----                     ----                                   -----------
wide_schema/Q01_narrow    1.00     79.8±0.51ms        ? ?/sec    1.00     79.9±0.26ms        ? ?/sec
wide_schema/Q01_wide      1.01   1016.5±7.03ms        ? ?/sec    1.00   1009.8±7.68ms        ? ?/sec
wide_schema/Q02_narrow    1.00      5.9±0.10ms        ? ?/sec    1.00      5.9±0.10ms        ? ?/sec
wide_schema/Q02_wide      1.03    905.4±2.97ms        ? ?/sec    1.00    881.9±2.95ms        ? ?/sec
wide_schema/Q03_narrow    1.00     14.9±0.18ms        ? ?/sec    1.01     15.0±0.29ms        ? ?/sec
wide_schema/Q03_wide      1.01    907.8±2.88ms        ? ?/sec    1.00    896.6±4.89ms        ? ?/sec
wide_schema/Q04_narrow    1.00     36.9±0.16ms        ? ?/sec    1.01     37.1±0.18ms        ? ?/sec
wide_schema/Q04_wide      1.02    998.9±7.60ms        ? ?/sec    1.00    979.0±5.84ms        ? ?/sec

Resource Usage

wide_schema — base (merge-base)

Metric	Value
Wall time	925.2s
Peak memory	1.2 GiB
Avg memory	104.4 MiB
CPU user	389.8s
CPU sys	53.2s
Peak spill	0 B

wide_schema — branch

Metric	Value
Wall time	915.2s
Peak memory	1.2 GiB
Avg memory	109.0 MiB
CPU user	382.9s
CPU sys	51.5s
Peak spill	0 B

File an issue against this benchmark runner

…uide The file-statistics cache key changed from TableScopedPath to a schema-aware FileStatisticsCacheKey. Add an upgrade-guide entry covering the type-alias change and how to migrate custom cache implementations and direct get/put callers. Signed-off-by: Jiawei Zhao <Phoenix500526@163.com>

Phoenix500526 · 2026-06-27T15:24:42Z

Since it's an api-change, I think it makes sense to add an entry to the upgrade guide in https://github.com/apache/datafusion/blob/main/docs/source/library-user-guide/upgrading/55.0.0.md.

Added

mkleen · 2026-06-27T17:59:00Z

Looks like all regressions are fixed now. TBH this is quite a complicated solution for a problem which would not exist if we simply avoid caching in this case.

mkleen · 2026-06-27T17:59:26Z

@kosiew Do you maybe have time for a second opinion?

kosiew

@Phoenix500526
Thanks for the fix. I think the schema-aware cache key is the right direction, but I think the implementation can be simplified a bit before this lands.

kosiew · 2026-06-29T04:15:13Z

+/// nullability, in order. It deliberately excludes field/schema metadata, which
+/// cannot affect statistics — including it would needlessly fragment the cache.
+#[derive(Clone, Debug)]
+pub struct SchemaFingerprint {


The schema-aware key looks correct, and I think this fixes the bug. That said, the implementation feels a bit more involved than this cache path needs.

Could we simplify SchemaFingerprint to a small derived newtype over something like Vec<(String, DataType, bool)>, or an equivalent representation, and rely on derived Hash and Eq? The precomputed hash plus custom PartialEq collision handling adds some cleverness that feels hard to justify here unless profiling shows schema-key hashing is material.

If you have a look at the benchmarks results clickbench_partitioned and wide_schema you will find that without this optimization there are real regressions.

kosiew · 2026-06-29T04:15:13Z

+    fn heap_size(&self, ctx: &mut DFHeapSizeCtx) -> usize {
+        self.path.as_ref().heap_size(ctx)
+            + self.table.heap_size(ctx)
+            + self.schema.as_ref().heap_size(ctx)


FileStatisticsCacheKey::heap_size appears to deep-count the shared SchemaFingerprint for every cached file key. Since ListingTable now shares one Arc<SchemaFingerprint> across all files for the same table and schema, this could overstate cache memory for wide schemas with many files and lead to earlier eviction than necessary.

Could we count only the incremental cache-owned cost here, or add a small test that documents the intended accounting tradeoff?

Phoenix500526 force-pushed the issue/23072 branch from 44b33de to a13e269 Compare June 26, 2026 07:56

github-actions Bot added core Core DataFusion crate catalog Related to the catalog crate execution Related to the execution crate labels Jun 26, 2026

mkleen reviewed Jun 26, 2026

View reviewed changes

Comment thread datafusion/execution/src/cache/mod.rs

Comment thread datafusion/execution/src/cache/mod.rs

github-actions Bot added the common Related to common crate label Jun 26, 2026

Phoenix500526 added 2 commits June 27, 2026 10:39

mkleen reviewed Jun 27, 2026

View reviewed changes

Phoenix500526 force-pushed the issue/23072 branch from 9618f88 to a767586 Compare June 27, 2026 13:24

github-actions Bot added the documentation Improvements or additions to documentation label Jun 27, 2026

kosiew reviewed Jun 29, 2026

View reviewed changes

Uh oh!

Conversation

Phoenix500526 commented Jun 26, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

mkleen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

adriangb commented Jun 26, 2026

Uh oh!

adriangb commented Jun 26, 2026

Uh oh!

adriangb commented Jun 26, 2026

Uh oh!

adriangb commented Jun 26, 2026

Uh oh!

mkleen commented Jun 26, 2026

Uh oh!

adriangb commented Jun 26, 2026

Uh oh!

adriangb commented Jun 26, 2026

Uh oh!

adriangb commented Jun 26, 2026

Uh oh!

adriangb commented Jun 26, 2026

Uh oh!

adriangbot commented Jun 26, 2026

Uh oh!

adriangbot commented Jun 26, 2026

Uh oh!

adriangbot commented Jun 26, 2026

Uh oh!

adriangbot commented Jun 26, 2026

Uh oh!

adriangbot commented Jun 26, 2026

Uh oh!

adriangbot commented Jun 26, 2026

Uh oh!

mkleen commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkleen commented Jun 26, 2026

Uh oh!

mkleen commented Jun 26, 2026

Uh oh!

adriangbot commented Jun 26, 2026

Uh oh!

adriangbot commented Jun 26, 2026

Uh oh!

adriangbot commented Jun 26, 2026

Uh oh!

adriangbot commented Jun 26, 2026

Uh oh!

mkleen commented Jun 27, 2026

Uh oh!

adriangbot commented Jun 27, 2026

Uh oh!

mkleen commented Jun 27, 2026

Uh oh!

adriangbot commented Jun 27, 2026

Uh oh!

mkleen commented Jun 27, 2026

Uh oh!

adriangbot commented Jun 27, 2026

Uh oh!

adriangbot commented Jun 27, 2026

Uh oh!

mkleen commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkleen Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

mkleen left a comment •

edited

Loading

mkleen commented Jun 26, 2026 •

edited

Loading

mkleen commented Jun 27, 2026 •

edited

Loading

mkleen Jun 27, 2026 •

edited

Loading

Phoenix500526 Jun 27, 2026 •

edited

Loading

mkleen commented Jun 27, 2026 •

edited

Loading

kosiew left a comment •

edited

Loading

kosiew Jun 29, 2026 •

edited

Loading