test: strengthen e2e behavioral coverage of the 2025-11-25 surface#2262
Closed
felixweinberger wants to merge 19 commits into
Closed
test: strengthen e2e behavioral coverage of the 2025-11-25 surface#2262felixweinberger wants to merge 19 commits into
felixweinberger wants to merge 19 commits into
Conversation
Downstream code matches SDK errors by literal code values, error.name, and message substrings rather than enum members or instanceof (which breaks across package boundaries). None of these were pinned, so a renumber or reword would pass the entire suite silently. Lock them in: - ErrorCode: exact-equality table of all 8 members' numeric values (also locks membership in both directions) - McpError: error.name === 'McpError' set explicitly in the constructor - close rejection: literal code -32000 plus a message containing 'Connection closed', not just the enum member - streamable HTTP GET failure: StreamableHTTPError with the HTTP status as .code and the 'Failed to open SSE stream' message marker - per-session hosting: the 'No valid session ID' rejection surfaces through the client error with .code 400 and the body text embedded - OAuth discovery failures: 'trying to load' + 'metadata' substrings - startAuthorization/registerClient: anchored 'Incompatible auth server' message head - parseErrorResponse fallback: anchored 'HTTP <status>' message head
The mutual-assignability checks tolerate renames or removals of optional members on passthrough result types: the old key is absorbed by the catchall index signature and the renamed key is optional, so assignment still compiles in both directions. Renaming CallToolResult's isError to isFailure passed typecheck and every existing test. Add per-type key lists (checked against keyof with index signatures stripped) for the result members consumers read by name (isError, structuredContent, _meta, nextCursor, instructions, capability flags), plus a runtime check that CallToolResultSchema.parse preserves isError.
Pin the accept/strip/reject line each Zod schema draws at the wire: - EmptyResult acks are strict: an extra non-_meta key on a ping ack rejects client-side with the raw validator error - JSON-RPC envelopes are strict: an unknown top-level sibling field is rejected at the deserialization boundary (HTTP 400/-32700; stdio onerror + drop + connection survives) - Typed request params strip unknown siblings instead of rejecting them - Typed results pass unknown top-level siblings through to the caller (tools/call and completion/complete) - A result failing the consumer-supplied schema rejects with the raw issues-bearing validator error, not a wrapped McpError, for both request() and connect()
Every e2e cell is labeled with a spec version, but until now nothing tied the label to the wire: the client always requests LATEST_PROTOCOL_VERSION, so a version bump silently re-points every cell at the new negotiation with no red anywhere in test/e2e. - wire() now accepts the cell's TestArgs in place of a bare transport name and, after connect, asserts the handshake actually ran at the cell's labeled version - both the version the client requested in initialize and the version it accepted from the result (captured via setProtocolVersion, the one transport-agnostic seam that carries it). All scenario bodies hand their cell args through; tests that deliberately negotiate a different version (downgrade, fallback, reject) keep the bare-transport form and say so. - lifecycle:version:match now runs against the SDK's own initialize handler and asserts the response version equals the requested version off the wire - a server that downgraded no longer passes. - New guards: '2025-11-25' must stay in SUPPORTED_PROTOCOL_VERSIONS independent of LATEST_PROTOCOL_VERSION (the supported list names the latest version only by reference, so a naive bump silently drops the previous version), and LATEST_PROTOCOL_VERSION must appear on the e2e version axis so a bump is red in this suite until the matrix is consciously extended.
Defaults were only ever exercised through explicit options, so changing
any of them passed the whole suite. Lock each one at its observable
seam:
- Protocol: a request without a timeout option must still be pending one
tick before 60s and reject with RequestTimeout (data.timeout 60000)
exactly at it; DEFAULT_REQUEST_TIMEOUT_MSEC pinned by value.
- Protocol: fallbackRequestHandler now has dispatch coverage - an
unhandled method routes to it instead of method-not-found, a
registered handler still wins, and without a fallback the
method-not-found error is returned (first tests of this seam).
- McpServer: one instance connected to a second transport after the
first closes keeps its tool table and serves the new connection (the
per-connection hosting pattern); a second connect while the first is
open rejects without corrupting the live connection.
- StreamableHTTP client: the default reconnection options are asserted
by value (initialDelay 1000, cap 30000, growFactor 1.5, maxRetries 2),
including the backoff schedule they produce and the terminal
'Maximum reconnection attempts (2) exceeded.' at the default budget.
- Hosting: a post-initialize request without an mcp-protocol-version
header is treated as DEFAULT_NEGOTIATED_PROTOCOL_VERSION ('2025-03-26',
pinned by value): no priming event on an eventStore host, identical to
sending the default explicitly, with a primed positive control.
…r envelope shape - timeout: after a progress-triggered reset, the re-armed timer must actually fire when the interval elapses with no further progress (reset-on-progress only proved the original deadline stopped applying) - progress: two concurrent requests on one client each receive exactly their own updates — broadcast dispatch to all handlers now fails - request ids: pin the dense zero-based sequence (initialize takes id 0, each later request id is the previous plus one), not just uniqueness - cancellation: notifications/cancelled for requestId 0 should abort the id-0 handler; the falsy guard in _oncancel drops it today, recorded as a knownFailure so a silent fix or regression flips the manifest - cancellation: a completed request's real id is also silently ignored (previously only never-issued ids were exercised) - errors: an error envelope for a handler error without data omits the data member entirely; the client-side rejection has data === undefined
…and known-failure arms
…acement, and reconnect timing in the Streamable HTTP client
…on, session routing, and stateless guards - content-type: every non-JSON type 415s, including text/json and a missing header; a charset-parameterized application/json is accepted - disconnect-not-cancel: new resumption arm — with an eventStore, the result of a dropped POST is delivered on the GET + Last-Event-ID replay stream after the handler completes - dns-rebinding: a request with NO Host header is rejected while allowedHosts is configured; DELETE honors the Origin check too - method-405: HEAD and OPTIONS join PUT/PATCH in the unsupported loop - notifications-202: result-form and error-form JSON-RPC responses also return 202 with an empty body - onerror: four more rejection branches (invalid envelope, unsupported protocol version, re-initialize, second GET) each fire exactly one error callback, asserted by count without pinning message text - parse-error-400: single-fault invalid-envelope payloads (no method, object id, wrong jsonrpc version) each 400 with code -32700 - protocol-version-400: the GET and DELETE validation paths 400 like the POST path - sse-close-after-response: probe emits progress frames first, so the stream must close because the response was written, response last - standalone-sse-no-response: an orphaned response (POST connection dropped mid-call) never appears on the standalone GET stream - send-no-listener: the unrouted notification is dropped, not buffered for a later GET stream - session reuse: per-session counter plus a decoy session prove routing is by Mcp-Session-Id header, with state preserved - session isolation: close-oldest direction — deleting the first session leaves a newer one serving - reinitialize: rejected with and without the session header (driving the transport directly), -32600 body, original session stays live - stateless no-reuse: the second-use guard also rejects a POSTed notification and a GET, not just request-bearing POSTs - stateless no-session-id: a bogus Mcp-Session-Id on a non-initialize request is served normally — no session validation in stateless mode - stateless restrictions: split per server-initiated request kind (sampling/elicitation/roots); sampling and roots hang today and stay knownFailures, elicitation already fails fast via its capability gate and is locked as a positive cell
…ce timing, env handling) and SSE fetch precedence - value-import and exercise deserializeMessage so the export cannot be silently dropped from shared/stdio - subclass StdioClientTransport with an overridden start() and run a real round-trip, locking subclassability - pin both close() escalation grace periods (~2s stdin-EOF grace before SIGTERM, ~2s more before SIGKILL) against children that ignore stdin EOF and SIGTERM - pin getDefaultEnvironment membership against a frozen copy of the safelist (instead of comparing it to its own output) plus the shell-function value guard - e2e: spawn with no stderr option and assert the inherit default (transport.stderr null, JSON-RPC channel undisturbed) - e2e: a consumer env entry colliding with a safelist key wins the merge; uncollided defaults still reach the child (new env-value fixture tool) - SSE: when both eventSourceInit.fetch and opts.fetch are set, the stream-open GET uses eventSourceInit.fetch and POSTs use opts.fetch
…, scope-selection tiers, PKCE refusal arms, standalone-GET 401 handling - mock AS now serves endpoints at the paths its metadata advertises, so DCR and the low-level helpers are proven to use discovered endpoints (non-default registration/authorize/token paths 404 a hardcoding client) - scope selection runs against a live PRM document in every arm: challenge scope beats a real scopes_supported, multi-element fallback joins all scopes, omit only when no source supplies one; new cell pins the clientMetadata.scope fallback on both the authorize URL and the DCR body - refresh /token body on the 403 step-up pins the exact param set (no scope param — mechanism of the known refresh-scope gap, paired with the existing knownFailure arm) - PKCE refusal looped over plain/empty/lowercase lists; absent-field arm added as a knownFailure (spec requires refusal, SDK proceeds) - standalone SSE GET 401: redirect outcome surfaces via onerror without retry (green); missing circuit breaker on the refresh path documented as a knownFailure (every successful refresh re-opens the GET) - invalid_grant on the authorization-code exchange invalidates only tokens; server_error during refresh falls back to interactive auth without invalidating credentials; transparent refresh skips re-registration
…skip-on-error and override arms
- new positive cell pins McpServer single-page list completeness: 25 items per
kind come back in one page with no nextCursor, and an unknown cursor leaves
the result unchanged (the pagination knownFailures only ratchet the other
direction — they would stay green if list results were truncated)
- tools/call text result and resources/read blob result now assert the literal
`content`/`blob` keys on the raw wire; the parsed view cannot distinguish a
missing key from the client schema default
- structured-content: only-structuredContent handler round-trips (client sees
content []); requirement wording widened to match. Text-mirror requirement
reworded as the handler-authored pattern it is — McpServer does not
synthesize a mirror
- server-side output-schema skip-on-error now covers an isError result carrying
schema-violating structuredContent (cold call, before the client validator
cache is primed) — passes through unvalidated instead of becoming an
output-validation error
- prompts/get with no arguments asserts the `arguments` key is absent from the
wire request, not just defaulted away
- resource template vars: exploded {var*} values arrive as an array under the
bare key; percent-encoded segments route but are not decoded (pinned as-is)
- metadata-override list entry overriding only description proves field-by-field
merge (inherited mimeType survives)
- subscribe capability sub-flag positively asserted through the
InitializeResult merge in both the capability and subscribe-updated bodies
- unit: low-level setRequestHandler before registerTool/Resource/Prompt throws
(handler-conflict guard); transport-level errors and failed-to-send-response
reach mcpServer.server.onerror — requirement notes narrowed/annotated to
match what each tier locks
- resources:annotations requirement no longer claims annotations on read
contents entries (not a spec or SDK concept)
- elicitation handler registration is last-write-wins on a Client (replace before connect and mid-session); the replaced handler never runs - raw elicitation/create mode-bypass arm as an expected failure: the protocol-layer capability gate is mode-blind, so an undeclared-mode request still reaches the wire; exact server-gate message pinned in the helper arm - url-mode accept content pass-through pinned (no SDK-side enforcement) and the requirement wording re-scoped to what the SDK does - complete-unknown-ignored observes via fallbackNotificationHandler so a future built-in handler is not displaced by the test's own registration - url-required-error round-trips two pending elicitations and adds a resource-handler sibling (no tools/call catch wrapper on that path) - url-required-then-retry pins the raw wire frame: literal -32042 with error.data.elicitations array - url-at-session-init tees HTTP response bodies and proves the unsolicited elicitation arrived on the standalone GET stream and on no POST response - two concurrent OAuth providers against one server stay isolated end to end (client ids, state, verifiers, token exchanges, bearer on every request) - transport.close() during a pending connect() settles connect with ConnectionClosed and detaches the transport
…undaries, audience-isolation pattern
- lifecycle: server→client ping cell (zero prior call sites of Server.ping in
any test); asserts the client's built-in auto-pong answers with a literal {}
on the wire
- list-changed capability gating gets the role-flipped sibling: prompts
advertised, TOOLS as the negative gate — the per-kind client gate clauses are
separate code and tools was positive-only
- output-schema: isError results carrying schema-violating structuredContent
still reject client-side (pinned as the exact boundary of the isError skip;
requirement wording re-scoped to the absent-payload skip the client really
has), and additionalProperties:false violations reject (third sub-case, locks
out validators that strip/ignore extras)
- completion: ref/prompt completion with context.arguments asserted disjoint
per context (prompt routing is separate from the resource arm); new cell pins
the client-side 100-value CompleteResultSchema cap (101 values reject with
the raw too_big validator error; 100 resolve)
- logging: structured object data round-trips notifications/message verbatim
(everything else in the suite is string data, so a String(data) coercion was
invisible); no-logging-capability behavior pinned (sendLoggingMessage is a
silent no-op, direct notification() throws, nothing crosses the wire); the
current -32603 wrapping of invalid setLevel levels pinned positively with
wire receipt proof — the spec -32602 cell is an expected failure and
reason-blind to both halves
- sampling pre-wire validation breadth: image, audio, and leading-position
mix-ins with tool_result all reject before the wire, not just trailing text
- protocol internal-error: raw-fetch JSON-RPC session on the HTTP hosts proves
-32603 is server-authored on the wire, independent of the SDK client
- resources: unknown-uri current code (-32602) pinned positively alongside the
spec -32002 expected failure (two-direction ratchet, code only, no message
pin); multi-entry ReadResourceResult contents (text+blob, order, verbatim)
- hosting auth: aud-validation expected-failure body reshaped with a built-in
positive control and exact challenge asserts so the flip is unambiguous;
new verifier-enforced two-resource-server audience-isolation cell locks the
documented user-level pattern that works today; getOAuthProtectedResource-
MetadataUrl gets direct unit locks for root and path-suffixed URLs
- tool name edges: dotted/namespaced and 128-char names register, list, and
call verbatim on both server tiers
Consumers match on the leading "Maximum reconnection attempts" text of the terminal reconnection error, not on the full sentence, so the message head is the compatibility surface worth locking. Asserting the exact string also duplicated the retry default, which is already asserted separately by value in the default-options test. Matching the stable head keeps the consumer contract pinned while allowing the parenthesized count and trailing wording to change with the default.
test/packaging/exports.test.ts packs the real artifact (npm pack), installs it into a throwaway project without touching the network, and resolves every public entry point through the package export map from both ESM (import) and CJS (require), asserting each resolves and exposes its expected primary symbols. Covers the named export groups and the deep subpath imports that currently resolve only via the './*' wildcard; no in-repo test exercised the export map or the dist trees before. Runs as `npm run test:packaging`; excluded from plain `npm test` because it rebuilds and packs the package first.
|
commit: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Strengthens the e2e suite's lock on observable SDK behavior — error codes and messages, version negotiation, schema strictness boundaries, runtime defaults, and transport/OAuth/server/elicitation flows — so that internal changes on this integration branch that alter the 2025-11-25 behavior surface fail CI instead of landing silently.
Motivation and Context
The upcoming spec-revision work on this branch will churn SDK internals while existing behavior must stay intact. Reviewing the suite against the public behavior surface showed real gaps: several behavior-altering changes (error-code remaps, schema strictness flips, version-negotiation changes, default changes) passed the full test suite. This PR closes those gaps:
fallbackRequestHandlerand repeatedconnect()on one server instanceHow Has This Been Tested?
Full suite at the branch tip: e2e 1268/1268 (~22s), remaining tests 1663/1663 (including the new packaging checks, 39/39), typecheck and lint clean. The suite is deterministic across repeated full runs. Deliberately re-introducing representative behavior changes (error-code remaps, strictness flips, version bumps, default changes) turns the suite red; breaking an export-map entry fails the packaging checks.
Breaking Changes
None — tests, contributor notes, and two package.json script entries only. No
src/changes.Types of changes
Checklist
Additional context
18 test-only commits, each scoped to one concern (reviewable commit-by-commit). Sets the behavioral baseline for this integration branch: subsequent work here is expected to keep this suite green for the 2025-11-25 protocol surface.