Add profiling tool and run in CI by tautschnig · Pull Request #8859 · diffblue/cbmc

tautschnig · 2026-03-11T09:21:19Z

Add scripts/profile_cbmc.py and scripts/profiling/ package for profiling CBMC's pre-solver stages using perf with flamegraph generation.

Features:

perf-based sampling with DWARF call graphs (16KB stack dumps, empirically validated to capture 91% of full stacks at 25% of the data size vs the default 64KB)
Aggregated results across multiple benchmarks with call chain analysis
Source-level call site resolution via addr2line (--debug-binary)
Built-in benchmarks: --auto (3 quick), --auto-large (10 extended), --auto-csmith (5 CSmith-generated with fixed seeds)
Differential profiling between git refs (--diff REF_A REF_B)
Multi-run mode for statistical significance (--runs N)
Actionable hotspot analysis showing direct callers, call paths, per-benchmark breakdown, and source locations

CI run sthe profiling tool on every PR with 3 runs per benchmark for statistical significance. Posts a text summary to the GitHub step summary and uploads flamegraph SVGs as downloadable artifacts.

Co-authored-by: Kiro kiro-agent@users.noreply.github.com

Each commit message has a non-empty body, explaining why the change was made.
Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
n/a My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Add scripts/profile_cbmc.py and scripts/profiling/ package for profiling CBMC's pre-solver stages using perf with flamegraph generation. Features: - perf-based sampling with DWARF call graphs (16KB stack dumps, empirically validated to capture 91% of full stacks at 25% of the data size vs the default 64KB) - Aggregated results across multiple benchmarks with call chain analysis - Source-level call site resolution via addr2line (--debug-binary) - Built-in benchmarks: --auto (3 quick), --auto-large (10 extended), --auto-csmith (5 CSmith-generated with fixed seeds) - Differential profiling between git refs (--diff REF_A REF_B) - Multi-run mode for statistical significance (--runs N) - Actionable hotspot analysis showing direct callers, call paths, per-benchmark breakdown, and source locations Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Run the profiling tool on every PR with 3 runs per benchmark for statistical significance. Posts a text summary to the GitHub step summary and uploads flamegraph SVGs as downloadable artifacts. Uses the --auto benchmarks (linked_list, array_ops, structs) which exercise pointer analysis, array encoding, and struct handling respectively. Solver time is excluded to focus on CBMC's own code. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Copilot

Pull request overview

Adds a new CBMC profiling utility (Python) that runs perf on selected benchmarks, generates per-benchmark + aggregated flamegraphs, and summarizes hotspots; plus a GitHub Actions workflow to run it on PRs.

Changes:

Introduces scripts/profile_cbmc.py and scripts/profiling/* to run perf record/report, generate FlameGraph SVGs, and print/save summaries (including optional multi-run averaging and diff mode).
Adds benchmark generation (built-in suites + optional CSmith seeds) to standardize profiling inputs.
Adds .github/workflows/profiling.yaml to run profiling on PRs and upload outputs as artifacts.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
scripts/profiling/utils.py	Common helpers, prerequisite checks, CBMC/FlameGraph discovery (with auto-clone).
scripts/profiling/runner.py	Executes perf-recorded benchmark runs; parses perf report; generates flamegraphs; supports multi-run averaging.
scripts/profiling/benchmarks.py	Generates built-in benchmark C sources and optional CSmith-based benchmarks.
scripts/profiling/analysis.py	Aggregates results, derives hotspot/call-chain summaries, writes summary.txt/results.json, optional addr2line mapping.
scripts/profiling/init.py	Declares the profiling package.
scripts/profile_cbmc.py	CLI entrypoint orchestrating builds/benchmarks/profiling, including diff mode via git worktrees.
.github/workflows/profiling.yaml	CI job to build CBMC, run profiling, publish summary and artifacts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-11T09:26:09Z

scripts/profiling/analysis.py

+        out(f"  {r['name']}/flamegraph.svg")
+        out(f"  {r['name']}/perf-report.txt")


In multi-run mode (when results are stored under run_//), the summary prints output paths as if files were directly under <output_dir>//. This makes the step summary misleading and points to non-existent paths. Consider printing paths derived from r['bench_dir'] (relative to output_dir) for flamegraph.svg and perf-report.txt instead of using r['name'] alone.

Suggested change

out(f" {r['name']}/flamegraph.svg")

out(f" {r['name']}/perf-report.txt")

out(f" {r['bench_dir']}/flamegraph.svg")

out(f" {r['bench_dir']}/perf-report.txt")

Copilot · 2026-03-11T09:26:09Z

scripts/profiling/runner.py

+    try:
+        script = subprocess.run(
+            ["perf", "script", "-i", perf_data],
+            capture_output=True, text=True, timeout=120)
+        collapse = subprocess.run(
+            [str(fg_dir / "stackcollapse-perf.pl")],
+            input=script.stdout, capture_output=True, text=True, timeout=60)
+
+        if not collapse.stdout.strip():
+            warn(f"[{result['name']}] No stack data to generate flamegraph")
+            return None
+
+        title = f"CBMC Profile: {result['name']}{title_suffix}"
+        fg_result = subprocess.run(
+            [str(fg_dir / "flamegraph.pl"), "--title", title],
+            input=collapse.stdout, capture_output=True, text=True, timeout=60)
+        svg.write_text(fg_result.stdout)
+        ok(f"[{result['name']}] Flamegraph: {svg}")


The flamegraph generation subprocess calls don't use check=True (perf script, stackcollapse-perf.pl, flamegraph.pl). As a result, failures won't raise CalledProcessError and the code may silently write empty/invalid SVGs while reporting success. Consider enabling check=True (or explicitly checking returncode and stderr) and surfacing stderr in the warning/error path.

Copilot · 2026-03-11T09:26:10Z

scripts/profiling/runner.py

+        last = run_list[-1]
+        # Compute mean and stddev for timings
+        avg_timings = {}
+        for key in last["timings"]:
+            vals = [r["timings"].get(key, 0) for r in run_list]
+            avg_timings[key] = statistics.mean(vals)
+            if len(vals) > 1:
+                avg_timings[f"{key}_stddev"] = statistics.stdev(vals)
+        elapsed_vals = [r["elapsed"] for r in run_list]


Multi-run averaging only iterates timing keys from the last run and fills missing keys with 0 for other runs. This can silently drop timing categories and biases means downward when a key is absent (e.g., due to timeouts or partial output). Consider averaging over the union of timing keys across all runs and either skipping missing values or treating them as None/NaN rather than 0.

Copilot · 2026-03-11T09:26:10Z

scripts/profiling/benchmarks.py

+        result = subprocess.run(
+            ["csmith", "--seed", str(seed)],
+            capture_output=True, text=True, timeout=30)


CSmith generation uses subprocess.run(..., timeout=30) but doesn't catch subprocess.TimeoutExpired. If csmith hangs for a seed, the whole profiling run will crash instead of skipping that seed. Consider wrapping this run in try/except TimeoutExpired and warning/continuing, similar to the nonzero return-code handling.

Suggested change

result = subprocess.run(

["csmith", "--seed", str(seed)],

capture_output=True, text=True, timeout=30)

try:

result = subprocess.run(

["csmith", "--seed", str(seed)],

capture_output=True, text=True, timeout=30)

except subprocess.TimeoutExpired:

warn(f"CSmith timed out for seed {seed}, skipping")

continue

Copilot · 2026-03-11T09:26:10Z

scripts/profiling/utils.py

+    fg_dir = REPO_ROOT / ".flamegraph"
+    if not (fg_dir / "flamegraph.pl").is_file():
+        info("Cloning FlameGraph repository...")
+        subprocess.run(
+            ["git", "clone", "--depth", "1",
+             "https://github.com/brendangregg/FlameGraph.git", str(fg_dir)],
+            capture_output=True, check=True)
+    ok(f"FlameGraph tools: {fg_dir}")


ensure_flamegraph clones FlameGraph from GitHub at runtime without pinning a commit/tag. This makes runs non-reproducible and introduces a supply-chain risk (CI will execute whatever is on the default branch at the time). Consider pinning to a known commit (e.g., clone then checkout a specific SHA), vendoring the needed scripts, or adding FlameGraph as a submodule. Also consider catching CalledProcessError here and reporting stderr via die() to avoid a Python traceback.

Copilot · 2026-03-11T09:26:11Z

.github/workflows/profiling.yaml

+        with:
+          name: profile-results
+          path: |
+            profile-results/*/flamegraph.svg


The profiling script writes per-benchmark flamegraphs under profile-results/run_// when --runs > 1, but the artifact glob only matches one directory deep (profile-results/*/flamegraph.svg). This will miss all generated flamegraphs in CI. Consider changing the pattern to profile-results/**/flamegraph.svg (and similarly for perf-report.txt if you want it uploaded).

Suggested change

profile-results/*/flamegraph.svg

profile-results/**/flamegraph.svg

profile-results/**/perf-report.txt

Copilot · 2026-03-11T09:26:11Z

.github/workflows/profiling.yaml

+          sudo apt-get install --no-install-recommends -yq \
+            cmake ninja-build gcc g++ flex bison ccache \
+            linux-tools-common linux-tools-$(uname -r) || true


The dependency install step ignores failures of linux-tools-$(uname -r) via || true. If that package isn't available, perf may be missing and the job will fail later in a less obvious way. Consider removing || true and/or adding a fallback install (e.g., linux-tools-generic) and then explicitly verifying perf --version so failures are surfaced early.

Suggested change

sudo apt-get install --no-install-recommends -yq \

cmake ninja-build gcc g++ flex bison ccache \

linux-tools-common linux-tools-$(uname -r) || true

# Try to install linux-tools for the running kernel; fall back to a generic package if needed.

sudo apt-get install --no-install-recommends -yq \

cmake ninja-build gcc g++ flex bison ccache \

linux-tools-common linux-tools-$(uname -r) || \

sudo apt-get install --no-install-recommends -yq \

linux-tools-common linux-tools-generic

# Verify that perf is available so failures surface early.

if ! perf --version >/dev/null 2>&1; then

echo "Error: perf is not installed or not functional after linux-tools installation." >&2

exit 1

fi

Copilot · 2026-03-11T09:26:11Z

.github/workflows/profiling.yaml

+        uses: actions/cache/restore@v5
+        with:
+          path: .ccache
+          key: ${{ runner.os }}-24.04-Release-profile-${{ github.sha }}
+          restore-keys: |
+            ${{ runner.os }}-24.04-Release-profile
+            ${{ runner.os }}-24.04-Release


This workflow restores a ccache directory but never saves it back (actions/cache/restore only restores). That means most runs won't benefit from caching and profiling CI time will be longer than necessary. Consider switching to actions/cache@v5 (restore+save) with save-always: true, or add an explicit actions/cache/save@v5 step at the end keyed on runner/os + toolchain.

Suggested change

uses: actions/cache/restore@v5

with:

path: .ccache

key: ${{ runner.os }}-24.04-Release-profile-${{ github.sha }}

restore-keys: |

${{ runner.os }}-24.04-Release-profile

${{ runner.os }}-24.04-Release

uses: actions/cache@v5

with:

path: .ccache

key: ${{ runner.os }}-24.04-Release-profile-${{ github.sha }}

restore-keys: |

${{ runner.os }}-24.04-Release-profile

${{ runner.os }}-24.04-Release

save-always: true

Copilot · 2026-03-11T09:26:11Z

scripts/profile_cbmc.py

+            subprocess.run(
+                ["git", "submodule", "update", "--init"],
+                cwd=str(ref_dir), capture_output=True, timeout=120)
+


This git submodule update --init call doesn't set check=True, so a failure (network hiccup, missing submodule ref, etc.) would be silently ignored and may only surface later as confusing build errors. Consider adding check=True (and possibly --recursive if needed) and including stderr in the die() message for easier diagnosis.

codecov · 2026-03-11T10:09:48Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.01%. Comparing base (b191cc1) to head (a18ffa9).

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #8859   +/-   ##
========================================
  Coverage    80.01%   80.01%           
========================================
  Files         1700     1700           
  Lines       188345   188345           
  Branches        73       73           
========================================
  Hits        150696   150696           
  Misses       37649    37649

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

tautschnig and others added 2 commits March 11, 2026 09:13

tautschnig requested review from a team and peterschrammel as code owners March 11, 2026 09:21

Copilot AI review requested due to automatic review settings March 11, 2026 09:21

Copilot started reviewing on behalf of tautschnig March 11, 2026 09:21 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add profiling tool and run in CI#8859

Add profiling tool and run in CI#8859
tautschnig wants to merge 2 commits intodiffblue:developfrom
tautschnig:profiling-tool

tautschnig commented Mar 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

codecov bot commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		out(f" {r['name']}/flamegraph.svg")
		out(f" {r['name']}/perf-report.txt")

	profile-results/*/flamegraph.svg
	profile-results/**/flamegraph.svg
	profile-results/**/perf-report.txt

Conversation

tautschnig commented Mar 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 11, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants