@@ -17,10 +17,102 @@ specific language governing permissions and limitations
1717under the License.
1818-->
1919
20- # Running Comet Benchmarks in Microk8s
20+ # Comet Benchmarks
2121
22- This guide explains how to run benchmarks derived from TPC-H and TPC-DS in Apache DataFusion Comet deployed in a
23- local Microk8s cluster.
22+ This guide explains how to run benchmarks derived from TPC-H and TPC-DS in Apache DataFusion Comet.
23+
24+ ## Table of Contents
25+
26+ - [ GitHub CI Benchmarks (Kind)] ( #github-ci-benchmarks-kind )
27+ - [ Local Development (Kind)] ( #local-development-kind )
28+ - [ Microk8s Deployment] ( #running-comet-benchmarks-in-microk8s )
29+
30+ ---
31+
32+ ## GitHub CI Benchmarks (Kind)
33+
34+ The project includes automated benchmark CI that runs on every PR affecting Rust (` native/**/*.rs ` ) or Scala/Java (` spark/**/*.scala ` , ` spark/**/*.java ` ) code.
35+
36+ ### What the CI Does
37+
38+ 1 . Creates a Kind Kubernetes cluster (1 control-plane + 2 workers)
39+ 2 . Installs Spark Operator via Helm
40+ 3 . Builds Comet from source
41+ 4 . Generates TPC-H SF=1 data (~ 1GB)
42+ 5 . Runs TPC-H Q1 with Spark baseline
43+ 6 . Runs TPC-H Q1 with Comet enabled
44+ 7 . ** Validates that Comet achieves ≥1.1x speedup (10% improvement)**
45+
46+ ### Manual Trigger
47+
48+ You can manually trigger the benchmark CI from GitHub Actions with custom parameters:
49+
50+ - ** scale_factor** : TPC-H scale factor (default: 1)
51+ - ** query** : TPC-H query (q1, q6, q14, simple)
52+ - ** min_speedup** : Minimum required speedup (default: 1.1)
53+
54+ ---
55+
56+ ## Local Development (Kind)
57+
58+ Run benchmarks locally using Kind (Kubernetes in Docker).
59+
60+ ### Prerequisites
61+
62+ ``` bash
63+ # Install Kind, kubectl, and Helm
64+ brew install kind kubectl helm # macOS
65+ # Or see: https://kind.sigs.k8s.io/docs/user/quick-start/
66+ ```
67+
68+ ### Quick Start
69+
70+ ``` bash
71+ # 1. Setup Kind cluster with Spark Operator
72+ ./hack/k8s-benchmark-setup.sh
73+
74+ # 2. Build Comet
75+ make release PROFILES=" -Pspark-3.5 -Pscala-2.12"
76+
77+ # 3. Build benchmark Docker image
78+ docker build -t comet-bench:local -f benchmarks/Dockerfile.k8s .
79+ kind load docker-image comet-bench:local --name comet-bench
80+
81+ # 4. Generate TPC-H data
82+ ./benchmarks/scripts/generate-tpch-data.sh 1 /tmp/comet-bench-data/tpch
83+
84+ # 5. Run Spark baseline
85+ ./benchmarks/scripts/run-k8s-benchmark.sh spark q1
86+
87+ # 6. Run Comet benchmark
88+ ./benchmarks/scripts/run-k8s-benchmark.sh comet q1
89+
90+ # 7. Compare results
91+ python3 benchmarks/scripts/compare-results.py \
92+ --spark /tmp/comet-bench-results/spark_q1_result.json \
93+ --comet /tmp/comet-bench-results/comet_q1_result.json \
94+ --min-speedup 1.1
95+
96+ # 8. Cleanup
97+ ./hack/k8s-benchmark-setup.sh --delete
98+ ```
99+
100+ ### Environment Variables
101+
102+ | Variable | Default | Description |
103+ | ----------| ---------| -------------|
104+ | ` COMET_BENCH_CLUSTER ` | ` comet-bench ` | Kind cluster name |
105+ | ` COMET_BENCH_NAMESPACE ` | ` comet-bench ` | Kubernetes namespace |
106+ | ` COMET_DOCKER_IMAGE ` | ` comet-bench:local ` | Docker image for benchmarks |
107+ | ` DRIVER_MEMORY ` | ` 2g ` | Spark driver memory |
108+ | ` EXECUTOR_MEMORY ` | ` 2g ` | Spark executor memory |
109+ | ` EXECUTOR_INSTANCES ` | ` 2 ` | Number of Spark executors |
110+
111+ ---
112+
113+ ## Running Comet Benchmarks in Microk8s
114+
115+ This section explains how to run benchmarks in a local Microk8s cluster
24116
25117## Use Microk8s locally
26118
0 commit comments