Add profiling tool and run in CI#8859
Conversation
Add scripts/profile_cbmc.py and scripts/profiling/ package for profiling CBMC's pre-solver stages using perf with flamegraph generation. Features: - perf-based sampling with DWARF call graphs (16KB stack dumps, empirically validated to capture 91% of full stacks at 25% of the data size vs the default 64KB) - Aggregated results across multiple benchmarks with call chain analysis - Source-level call site resolution via addr2line (--debug-binary) - Built-in benchmarks: --auto (3 quick), --auto-large (10 extended), --auto-csmith (5 CSmith-generated with fixed seeds) - Differential profiling between git refs (--diff REF_A REF_B) - Multi-run mode for statistical significance (--runs N) - Actionable hotspot analysis showing direct callers, call paths, per-benchmark breakdown, and source locations Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Run the profiling tool on every PR with 3 runs per benchmark for statistical significance. Posts a text summary to the GitHub step summary and uploads flamegraph SVGs as downloadable artifacts. Uses the --auto benchmarks (linked_list, array_ops, structs) which exercise pointer analysis, array encoding, and struct handling respectively. Solver time is excluded to focus on CBMC's own code. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new CBMC profiling utility (Python) that runs perf on selected benchmarks, generates per-benchmark + aggregated flamegraphs, and summarizes hotspots; plus a GitHub Actions workflow to run it on PRs.
Changes:
- Introduces
scripts/profile_cbmc.pyandscripts/profiling/*to runperf record/report, generate FlameGraph SVGs, and print/save summaries (including optional multi-run averaging and diff mode). - Adds benchmark generation (built-in suites + optional CSmith seeds) to standardize profiling inputs.
- Adds
.github/workflows/profiling.yamlto run profiling on PRs and upload outputs as artifacts.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/profiling/utils.py | Common helpers, prerequisite checks, CBMC/FlameGraph discovery (with auto-clone). |
| scripts/profiling/runner.py | Executes perf-recorded benchmark runs; parses perf report; generates flamegraphs; supports multi-run averaging. |
| scripts/profiling/benchmarks.py | Generates built-in benchmark C sources and optional CSmith-based benchmarks. |
| scripts/profiling/analysis.py | Aggregates results, derives hotspot/call-chain summaries, writes summary.txt/results.json, optional addr2line mapping. |
| scripts/profiling/init.py | Declares the profiling package. |
| scripts/profile_cbmc.py | CLI entrypoint orchestrating builds/benchmarks/profiling, including diff mode via git worktrees. |
| .github/workflows/profiling.yaml | CI job to build CBMC, run profiling, publish summary and artifacts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| out(f" {r['name']}/flamegraph.svg") | ||
| out(f" {r['name']}/perf-report.txt") |
There was a problem hiding this comment.
In multi-run mode (when results are stored under run_//), the summary prints output paths as if files were directly under <output_dir>//. This makes the step summary misleading and points to non-existent paths. Consider printing paths derived from r['bench_dir'] (relative to output_dir) for flamegraph.svg and perf-report.txt instead of using r['name'] alone.
| out(f" {r['name']}/flamegraph.svg") | |
| out(f" {r['name']}/perf-report.txt") | |
| out(f" {r['bench_dir']}/flamegraph.svg") | |
| out(f" {r['bench_dir']}/perf-report.txt") |
| try: | ||
| script = subprocess.run( | ||
| ["perf", "script", "-i", perf_data], | ||
| capture_output=True, text=True, timeout=120) | ||
| collapse = subprocess.run( | ||
| [str(fg_dir / "stackcollapse-perf.pl")], | ||
| input=script.stdout, capture_output=True, text=True, timeout=60) | ||
|
|
||
| if not collapse.stdout.strip(): | ||
| warn(f"[{result['name']}] No stack data to generate flamegraph") | ||
| return None | ||
|
|
||
| title = f"CBMC Profile: {result['name']}{title_suffix}" | ||
| fg_result = subprocess.run( | ||
| [str(fg_dir / "flamegraph.pl"), "--title", title], | ||
| input=collapse.stdout, capture_output=True, text=True, timeout=60) | ||
| svg.write_text(fg_result.stdout) | ||
| ok(f"[{result['name']}] Flamegraph: {svg}") |
There was a problem hiding this comment.
The flamegraph generation subprocess calls don't use check=True (perf script, stackcollapse-perf.pl, flamegraph.pl). As a result, failures won't raise CalledProcessError and the code may silently write empty/invalid SVGs while reporting success. Consider enabling check=True (or explicitly checking returncode and stderr) and surfacing stderr in the warning/error path.
| last = run_list[-1] | ||
| # Compute mean and stddev for timings | ||
| avg_timings = {} | ||
| for key in last["timings"]: | ||
| vals = [r["timings"].get(key, 0) for r in run_list] | ||
| avg_timings[key] = statistics.mean(vals) | ||
| if len(vals) > 1: | ||
| avg_timings[f"{key}_stddev"] = statistics.stdev(vals) | ||
| elapsed_vals = [r["elapsed"] for r in run_list] |
There was a problem hiding this comment.
Multi-run averaging only iterates timing keys from the last run and fills missing keys with 0 for other runs. This can silently drop timing categories and biases means downward when a key is absent (e.g., due to timeouts or partial output). Consider averaging over the union of timing keys across all runs and either skipping missing values or treating them as None/NaN rather than 0.
| result = subprocess.run( | ||
| ["csmith", "--seed", str(seed)], | ||
| capture_output=True, text=True, timeout=30) |
There was a problem hiding this comment.
CSmith generation uses subprocess.run(..., timeout=30) but doesn't catch subprocess.TimeoutExpired. If csmith hangs for a seed, the whole profiling run will crash instead of skipping that seed. Consider wrapping this run in try/except TimeoutExpired and warning/continuing, similar to the nonzero return-code handling.
| result = subprocess.run( | |
| ["csmith", "--seed", str(seed)], | |
| capture_output=True, text=True, timeout=30) | |
| try: | |
| result = subprocess.run( | |
| ["csmith", "--seed", str(seed)], | |
| capture_output=True, text=True, timeout=30) | |
| except subprocess.TimeoutExpired: | |
| warn(f"CSmith timed out for seed {seed}, skipping") | |
| continue |
| fg_dir = REPO_ROOT / ".flamegraph" | ||
| if not (fg_dir / "flamegraph.pl").is_file(): | ||
| info("Cloning FlameGraph repository...") | ||
| subprocess.run( | ||
| ["git", "clone", "--depth", "1", | ||
| "https://github.com/brendangregg/FlameGraph.git", str(fg_dir)], | ||
| capture_output=True, check=True) | ||
| ok(f"FlameGraph tools: {fg_dir}") |
There was a problem hiding this comment.
ensure_flamegraph clones FlameGraph from GitHub at runtime without pinning a commit/tag. This makes runs non-reproducible and introduces a supply-chain risk (CI will execute whatever is on the default branch at the time). Consider pinning to a known commit (e.g., clone then checkout a specific SHA), vendoring the needed scripts, or adding FlameGraph as a submodule. Also consider catching CalledProcessError here and reporting stderr via die() to avoid a Python traceback.
| with: | ||
| name: profile-results | ||
| path: | | ||
| profile-results/*/flamegraph.svg |
There was a problem hiding this comment.
The profiling script writes per-benchmark flamegraphs under profile-results/run_// when --runs > 1, but the artifact glob only matches one directory deep (profile-results/*/flamegraph.svg). This will miss all generated flamegraphs in CI. Consider changing the pattern to profile-results/**/flamegraph.svg (and similarly for perf-report.txt if you want it uploaded).
| profile-results/*/flamegraph.svg | |
| profile-results/**/flamegraph.svg | |
| profile-results/**/perf-report.txt |
| sudo apt-get install --no-install-recommends -yq \ | ||
| cmake ninja-build gcc g++ flex bison ccache \ | ||
| linux-tools-common linux-tools-$(uname -r) || true |
There was a problem hiding this comment.
The dependency install step ignores failures of linux-tools-$(uname -r) via || true. If that package isn't available, perf may be missing and the job will fail later in a less obvious way. Consider removing || true and/or adding a fallback install (e.g., linux-tools-generic) and then explicitly verifying perf --version so failures are surfaced early.
| sudo apt-get install --no-install-recommends -yq \ | |
| cmake ninja-build gcc g++ flex bison ccache \ | |
| linux-tools-common linux-tools-$(uname -r) || true | |
| # Try to install linux-tools for the running kernel; fall back to a generic package if needed. | |
| sudo apt-get install --no-install-recommends -yq \ | |
| cmake ninja-build gcc g++ flex bison ccache \ | |
| linux-tools-common linux-tools-$(uname -r) || \ | |
| sudo apt-get install --no-install-recommends -yq \ | |
| linux-tools-common linux-tools-generic | |
| # Verify that perf is available so failures surface early. | |
| if ! perf --version >/dev/null 2>&1; then | |
| echo "Error: perf is not installed or not functional after linux-tools installation." >&2 | |
| exit 1 | |
| fi |
| uses: actions/cache/restore@v5 | ||
| with: | ||
| path: .ccache | ||
| key: ${{ runner.os }}-24.04-Release-profile-${{ github.sha }} | ||
| restore-keys: | | ||
| ${{ runner.os }}-24.04-Release-profile | ||
| ${{ runner.os }}-24.04-Release |
There was a problem hiding this comment.
This workflow restores a ccache directory but never saves it back (actions/cache/restore only restores). That means most runs won't benefit from caching and profiling CI time will be longer than necessary. Consider switching to actions/cache@v5 (restore+save) with save-always: true, or add an explicit actions/cache/save@v5 step at the end keyed on runner/os + toolchain.
| uses: actions/cache/restore@v5 | |
| with: | |
| path: .ccache | |
| key: ${{ runner.os }}-24.04-Release-profile-${{ github.sha }} | |
| restore-keys: | | |
| ${{ runner.os }}-24.04-Release-profile | |
| ${{ runner.os }}-24.04-Release | |
| uses: actions/cache@v5 | |
| with: | |
| path: .ccache | |
| key: ${{ runner.os }}-24.04-Release-profile-${{ github.sha }} | |
| restore-keys: | | |
| ${{ runner.os }}-24.04-Release-profile | |
| ${{ runner.os }}-24.04-Release | |
| save-always: true |
| subprocess.run( | ||
| ["git", "submodule", "update", "--init"], | ||
| cwd=str(ref_dir), capture_output=True, timeout=120) | ||
|
|
There was a problem hiding this comment.
This git submodule update --init call doesn't set check=True, so a failure (network hiccup, missing submodule ref, etc.) would be silently ignored and may only surface later as confusing build errors. Consider adding check=True (and possibly --recursive if needed) and including stderr in the die() message for easier diagnosis.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #8859 +/- ##
========================================
Coverage 80.01% 80.01%
========================================
Files 1700 1700
Lines 188345 188345
Branches 73 73
========================================
Hits 150696 150696
Misses 37649 37649 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Add scripts/profile_cbmc.py and scripts/profiling/ package for profiling CBMC's pre-solver stages using perf with flamegraph generation.
Features:
CI run sthe profiling tool on every PR with 3 runs per benchmark for statistical significance. Posts a text summary to the GitHub step summary and uploads flamegraph SVGs as downloadable artifacts.
Co-authored-by: Kiro kiro-agent@users.noreply.github.com