Skip to content

Commit 4eeca68

Browse files
committed
Trim comments to the load-bearing lines
1 parent 45a2508 commit 4eeca68

4 files changed

Lines changed: 29 additions & 41 deletions

File tree

.github/actions/conformance/run-client.sh

Lines changed: 8 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,14 @@
11
#!/bin/bash
22
# Run a client conformance suite, re-verifying unexpected failures solo.
3-
#
4-
# Suite mode launches every scenario's client subprocess concurrently; on a
5-
# 2-vCPU runner that contention can push scenarios with real-time waits (the
6-
# SSE reconnect timing in sse-retry) past their tolerances. So a scenario the
7-
# suite run flags as an unexpected failure is re-run alone on the then-quiet
8-
# runner: a real failure fails again and the job stays red; a contention
9-
# artifact passes and the job goes green, with a FLAKE_RESCUED marker written
10-
# into the --output-dir so the artifact upload preserves the evidence.
11-
# Failures that only reproduce under concurrency are deliberately traded
12-
# away - the suite asserts spec compliance, not behavior under parallel load.
3+
# Concurrent suite runs on a 2-vCPU runner can push scenarios with real-time
4+
# waits past tolerance; solo, a real failure fails again while a contention
5+
# artifact passes. Failures that only reproduce under concurrency are excused.
136
set -uo pipefail
147

158
: "${CONFORMANCE_PKG:?set CONFORMANCE_PKG (pinned in .github/workflows/conformance.yml)}"
169
SOLO_ATTEMPTS="${CONFORMANCE_SOLO_ATTEMPTS:-2}"
1710

18-
# Relative paths in the arguments (the client command, --output-dir) resolve
19-
# from the repo root, same contract as run-server.sh.
11+
# Relative args resolve from the repo root; same contract as run-server.sh.
2012
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
2113
cd "$SCRIPT_DIR/../../.." || exit 1
2214

@@ -31,12 +23,8 @@ fi
3123

3224
plain="$(sed 's/\x1b\[[0-9;]*m//g' "$log")"
3325

34-
# Scenarios listed under "Unexpected failures (not in baseline):". Anything
35-
# else behind the nonzero exit (stale baseline entries, harness or infra
36-
# errors) is not retried. The extraction is coupled to the pinned harness's
37-
# summary wording and print order; if a pin bump changes either, the list
38-
# comes up empty and the original failure passes through - never a false
39-
# green.
26+
# If the harness's summary wording changes, the list comes up empty and the
27+
# original exit code passes through - never a false green.
4028
mapfile -t scenarios < <(
4129
printf '%s\n' "$plain" |
4230
sed -n '/^Unexpected failures (not in baseline):$/,/^$/p' |
@@ -58,10 +46,8 @@ if printf '%s\n' "$plain" | grep -q '^Stale baseline entries'; then
5846
exit "$rc"
5947
fi
6048

61-
# Reuse the suite invocation's arguments for the solo runs, minus the flags
62-
# that only make sense for a suite (--scenario replaces --suite; single runs
63-
# are judged directly, not against the baseline). Solo results are saved next
64-
# to the suite's so the uploaded artifact carries both.
49+
# Drop the suite-only flags: --scenario replaces --suite, and solo runs are
50+
# judged directly rather than against the baseline.
6551
rerun_args=()
6652
output_dir=""
6753
skip_next=0

.github/workflows/conformance.yml

Lines changed: 7 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -96,8 +96,7 @@ jobs:
9696
--expected-failures ./.github/actions/conformance/expected-failures.yml
9797
--output-dir conformance-results/server-all
9898
- name: Upload conformance results
99-
# The suite summary only prints counts for warning-level findings; the
100-
# per-check measurements live in the checks.json files saved above.
99+
# The log has only summary counts; per-check data is in checks.json.
101100
if: failure()
102101
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
103102
with:
@@ -131,22 +130,17 @@ jobs:
131130
echo "CONFORMANCE_PKG=file:/tmp/conformance.tgz" >> "$GITHUB_ENV"
132131
;;
133132
esac
134-
# --compile-bytecode: the harness spawns every scenario's client
135-
# concurrently; without pre-compiled site-packages, ~40 fresh
136-
# interpreters race to byte-compile the same modules on a 2-core
137-
# runner, saturating it for the first ~20s of the suite — exactly when
138-
# timing-sensitive scenarios take their measurements.
133+
# --compile-bytecode: without it, ~40 concurrently spawned interpreters
134+
# race to byte-compile site-packages during the timing-sensitive window.
139135
- run: uv sync --frozen --all-extras --package mcp --compile-bytecode
140136
- name: Pre-compile bytecode (editable sources)
141137
run: uv run --frozen python -m compileall -q src .github/actions/conformance
142138
- name: Run client conformance (all suite)
143139
# The harness runs all scenarios via unbounded Promise.all; with 40
144140
# scenarios on a 2-core runner the slowest one (sse-retry, which has a
145141
# real-time SSE reconnect wait) needs more than the 30s default budget.
146-
# The client command execs the synced venv's interpreter directly:
147-
# `uv run` would re-check the lockfile in every one of the ~40
148-
# concurrent spawns, compounding the startup storm. run-client.sh
149-
# re-verifies unexpected failures solo before failing the job.
142+
# `.venv/bin/python` (not `uv run`) avoids lockfile re-checks in ~40
143+
# concurrent spawns; run-client.sh re-runs unexpected failures solo.
150144
run: >-
151145
./.github/actions/conformance/run-client.sh
152146
--command '.venv/bin/python .github/actions/conformance/client.py'
@@ -164,10 +158,8 @@ jobs:
164158
--expected-failures ./.github/actions/conformance/expected-failures.2026-07-28.yml
165159
--output-dir conformance-results/client-2026-07-28
166160
- name: Upload conformance results
167-
# The suite summary only prints counts for warning-level findings; the
168-
# per-check measurements live in the checks.json files saved above.
169-
# Also upload when run-client.sh rescued a flake (job green, but the
170-
# contention evidence should not be discarded).
161+
# The log has only summary counts; per-check data is in checks.json.
162+
# Also on FLAKE_RESCUED: rescued-flake evidence is otherwise discarded.
171163
if: failure() || hashFiles('conformance-results/**/FLAKE_RESCUED') != ''
172164
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
173165
with:

examples/servers/everything-server/mcp_everything_server/server.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -191,10 +191,8 @@ async def test_tool_with_progress(ctx: Context) -> str:
191191
async def test_sampling(prompt: str, ctx: Context) -> str:
192192
"""Tests server-initiated sampling (LLM completion request)"""
193193
try:
194-
# Request sampling from client. related_request_id routes the request onto
195-
# the originating tools/call SSE stream, which exists for the whole handler;
196-
# without it the request targets the standalone GET stream and is dropped if
197-
# the client has not finished opening that stream yet.
194+
# Request sampling from client. Without related_request_id the request goes
195+
# to the standalone GET stream and is silently dropped if it is not open yet.
198196
result = await ctx.session.create_message( # pyright: ignore[reportDeprecated]
199197
messages=[SamplingMessage(role="user", content=TextContent(type="text", text=prompt))],
200198
max_tokens=100,

log.txt

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
Total: 1 passed, 2 failed, 0 warnings
2+
3+
Expected failures (in baseline):
4+
~ foo
5+
6+
Stale baseline entries (now passing - remove from baseline):
7+
✓ bar
8+
9+
Unexpected failures (not in baseline):
10+
✗ sse-retry
11+
12+
Baseline is stale: update your expected-failures file to remove passing scenarios.

0 commit comments

Comments
 (0)