test: strengthen e2e behavioral coverage of the 2025-11-25 surface by felixweinberger · Pull Request #2262 · modelcontextprotocol/typescript-sdk

felixweinberger · 2026-06-09T06:53:54Z

Strengthens the e2e suite's lock on observable SDK behavior — error codes and messages, version negotiation, schema strictness boundaries, runtime defaults, and transport/OAuth/server/elicitation flows — so that internal changes on this integration branch that alter the 2025-11-25 behavior surface fail CI instead of landing silently.

Motivation and Context

The upcoming spec-revision work on this branch will churn SDK internals while existing behavior must stay intact. Reviewing the suite against the public behavior surface showed real gaps: several behavior-altering changes (error-code remaps, schema strictness flips, version-negotiation changes, default changes) passed the full test suite. This PR closes those gaps:

Pins error-code/message literals that consumers observably depend on
Threads the spec version through the e2e harness and asserts the negotiated version per cell, so cells labeled 2025-11-25 keep testing 2025-11-25 negotiation when newer versions are added
Locks schema strict/loose boundaries (result passthrough, strict empty-result acks, params stripping, envelope strictness) and raw result-parse error identity
Pins runtime defaults (request timeout, reconnection, stderr) and adds first-ever coverage for fallbackRequestHandler and repeated connect() on one server instance
Broadens transport, OAuth, server, and elicitation scenario coverage; repairs tests that had been narrowed around known bugs and records those as known failures
Adds package export-map resolution checks (ESM + CJS, deep subpaths) and updates the e2e contributor notes to match the current runner

How Has This Been Tested?

Full suite at the branch tip: e2e 1268/1268 (~22s), remaining tests 1663/1663 (including the new packaging checks, 39/39), typecheck and lint clean. The suite is deterministic across repeated full runs. Deliberately re-introducing representative behavior changes (error-code remaps, strictness flips, version bumps, default changes) turns the suite red; breaking an export-map entry fails the packaging checks.

Breaking Changes

None — tests, contributor notes, and two package.json script entries only. No src/ changes.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Test coverage improvement (non-breaking)

Checklist

I have read the MCP Documentation
My code follows the repository's style guidelines
New and existing tests pass locally
I have added appropriate error handling
I have added or updated documentation as needed

Additional context

18 test-only commits, each scoped to one concern (reviewable commit-by-commit). Sets the behavioral baseline for this integration branch: subsequent work here is expected to keep this suite green for the 2025-11-25 protocol surface.

Downstream code matches SDK errors by literal code values, error.name, and message substrings rather than enum members or instanceof (which breaks across package boundaries). None of these were pinned, so a renumber or reword would pass the entire suite silently. Lock them in: - ErrorCode: exact-equality table of all 8 members' numeric values (also locks membership in both directions) - McpError: error.name === 'McpError' set explicitly in the constructor - close rejection: literal code -32000 plus a message containing 'Connection closed', not just the enum member - streamable HTTP GET failure: StreamableHTTPError with the HTTP status as .code and the 'Failed to open SSE stream' message marker - per-session hosting: the 'No valid session ID' rejection surfaces through the client error with .code 400 and the body text embedded - OAuth discovery failures: 'trying to load' + 'metadata' substrings - startAuthorization/registerClient: anchored 'Incompatible auth server' message head - parseErrorResponse fallback: anchored 'HTTP <status>' message head

The mutual-assignability checks tolerate renames or removals of optional members on passthrough result types: the old key is absorbed by the catchall index signature and the renamed key is optional, so assignment still compiles in both directions. Renaming CallToolResult's isError to isFailure passed typecheck and every existing test. Add per-type key lists (checked against keyof with index signatures stripped) for the result members consumers read by name (isError, structuredContent, _meta, nextCursor, instructions, capability flags), plus a runtime check that CallToolResultSchema.parse preserves isError.

Pin the accept/strip/reject line each Zod schema draws at the wire: - EmptyResult acks are strict: an extra non-_meta key on a ping ack rejects client-side with the raw validator error - JSON-RPC envelopes are strict: an unknown top-level sibling field is rejected at the deserialization boundary (HTTP 400/-32700; stdio onerror + drop + connection survives) - Typed request params strip unknown siblings instead of rejecting them - Typed results pass unknown top-level siblings through to the caller (tools/call and completion/complete) - A result failing the consumer-supplied schema rejects with the raw issues-bearing validator error, not a wrapped McpError, for both request() and connect()

Every e2e cell is labeled with a spec version, but until now nothing tied the label to the wire: the client always requests LATEST_PROTOCOL_VERSION, so a version bump silently re-points every cell at the new negotiation with no red anywhere in test/e2e. - wire() now accepts the cell's TestArgs in place of a bare transport name and, after connect, asserts the handshake actually ran at the cell's labeled version - both the version the client requested in initialize and the version it accepted from the result (captured via setProtocolVersion, the one transport-agnostic seam that carries it). All scenario bodies hand their cell args through; tests that deliberately negotiate a different version (downgrade, fallback, reject) keep the bare-transport form and say so. - lifecycle:version:match now runs against the SDK's own initialize handler and asserts the response version equals the requested version off the wire - a server that downgraded no longer passes. - New guards: '2025-11-25' must stay in SUPPORTED_PROTOCOL_VERSIONS independent of LATEST_PROTOCOL_VERSION (the supported list names the latest version only by reference, so a naive bump silently drops the previous version), and LATEST_PROTOCOL_VERSION must appear on the e2e version axis so a bump is red in this suite until the matrix is consciously extended.

Defaults were only ever exercised through explicit options, so changing any of them passed the whole suite. Lock each one at its observable seam: - Protocol: a request without a timeout option must still be pending one tick before 60s and reject with RequestTimeout (data.timeout 60000) exactly at it; DEFAULT_REQUEST_TIMEOUT_MSEC pinned by value. - Protocol: fallbackRequestHandler now has dispatch coverage - an unhandled method routes to it instead of method-not-found, a registered handler still wins, and without a fallback the method-not-found error is returned (first tests of this seam). - McpServer: one instance connected to a second transport after the first closes keeps its tool table and serves the new connection (the per-connection hosting pattern); a second connect while the first is open rejects without corrupting the live connection. - StreamableHTTP client: the default reconnection options are asserted by value (initialDelay 1000, cap 30000, growFactor 1.5, maxRetries 2), including the backoff schedule they produce and the terminal 'Maximum reconnection attempts (2) exceeded.' at the default budget. - Hosting: a post-initialize request without an mcp-protocol-version header is treated as DEFAULT_NEGOTIATED_PROTOCOL_VERSION ('2025-03-26', pinned by value): no priming event on an eventStore host, identical to sending the default explicitly, with a primed positive control.

…r envelope shape - timeout: after a progress-triggered reset, the re-armed timer must actually fire when the interval elapses with no further progress (reset-on-progress only proved the original deadline stopped applying) - progress: two concurrent requests on one client each receive exactly their own updates — broadcast dispatch to all handlers now fails - request ids: pin the dense zero-based sequence (initialize takes id 0, each later request id is the previous plus one), not just uniqueness - cancellation: notifications/cancelled for requestId 0 should abort the id-0 handler; the falsy guard in _oncancel drops it today, recorded as a knownFailure so a silent fix or regression flips the manifest - cancellation: a completed request's real id is also silently ignored (previously only never-issued ids were exercised) - errors: an error envelope for a handler error without data omits the data member entirely; the client-side rejection has data === undefined

…and known-failure arms

…acement, and reconnect timing in the Streamable HTTP client

…on, session routing, and stateless guards - content-type: every non-JSON type 415s, including text/json and a missing header; a charset-parameterized application/json is accepted - disconnect-not-cancel: new resumption arm — with an eventStore, the result of a dropped POST is delivered on the GET + Last-Event-ID replay stream after the handler completes - dns-rebinding: a request with NO Host header is rejected while allowedHosts is configured; DELETE honors the Origin check too - method-405: HEAD and OPTIONS join PUT/PATCH in the unsupported loop - notifications-202: result-form and error-form JSON-RPC responses also return 202 with an empty body - onerror: four more rejection branches (invalid envelope, unsupported protocol version, re-initialize, second GET) each fire exactly one error callback, asserted by count without pinning message text - parse-error-400: single-fault invalid-envelope payloads (no method, object id, wrong jsonrpc version) each 400 with code -32700 - protocol-version-400: the GET and DELETE validation paths 400 like the POST path - sse-close-after-response: probe emits progress frames first, so the stream must close because the response was written, response last - standalone-sse-no-response: an orphaned response (POST connection dropped mid-call) never appears on the standalone GET stream - send-no-listener: the unrouted notification is dropped, not buffered for a later GET stream - session reuse: per-session counter plus a decoy session prove routing is by Mcp-Session-Id header, with state preserved - session isolation: close-oldest direction — deleting the first session leaves a newer one serving - reinitialize: rejected with and without the session header (driving the transport directly), -32600 body, original session stays live - stateless no-reuse: the second-use guard also rejects a POSTed notification and a GET, not just request-bearing POSTs - stateless no-session-id: a bogus Mcp-Session-Id on a non-initialize request is served normally — no session validation in stateless mode - stateless restrictions: split per server-initiated request kind (sampling/elicitation/roots); sampling and roots hang today and stay knownFailures, elicitation already fails fast via its capability gate and is locked as a positive cell

…ce timing, env handling) and SSE fetch precedence - value-import and exercise deserializeMessage so the export cannot be silently dropped from shared/stdio - subclass StdioClientTransport with an overridden start() and run a real round-trip, locking subclassability - pin both close() escalation grace periods (~2s stdin-EOF grace before SIGTERM, ~2s more before SIGKILL) against children that ignore stdin EOF and SIGTERM - pin getDefaultEnvironment membership against a frozen copy of the safelist (instead of comparing it to its own output) plus the shell-function value guard - e2e: spawn with no stderr option and assert the inherit default (transport.stderr null, JSON-RPC channel undisturbed) - e2e: a consumer env entry colliding with a safelist key wins the merge; uncollided defaults still reach the child (new env-value fixture tool) - SSE: when both eventSourceInit.fetch and opts.fetch are set, the stream-open GET uses eventSourceInit.fetch and POSTs use opts.fetch

…, scope-selection tiers, PKCE refusal arms, standalone-GET 401 handling - mock AS now serves endpoints at the paths its metadata advertises, so DCR and the low-level helpers are proven to use discovered endpoints (non-default registration/authorize/token paths 404 a hardcoding client) - scope selection runs against a live PRM document in every arm: challenge scope beats a real scopes_supported, multi-element fallback joins all scopes, omit only when no source supplies one; new cell pins the clientMetadata.scope fallback on both the authorize URL and the DCR body - refresh /token body on the 403 step-up pins the exact param set (no scope param — mechanism of the known refresh-scope gap, paired with the existing knownFailure arm) - PKCE refusal looped over plain/empty/lowercase lists; absent-field arm added as a knownFailure (spec requires refusal, SDK proceeds) - standalone SSE GET 401: redirect outcome surfaces via onerror without retry (green); missing circuit breaker on the refresh path documented as a knownFailure (every successful refresh re-opens the GET) - invalid_grant on the authorization-code exchange invalidates only tokens; server_error during refresh falls back to interactive auth without invalidating credentials; transparent refresh skips re-registration

…skip-on-error and override arms - new positive cell pins McpServer single-page list completeness: 25 items per kind come back in one page with no nextCursor, and an unknown cursor leaves the result unchanged (the pagination knownFailures only ratchet the other direction — they would stay green if list results were truncated) - tools/call text result and resources/read blob result now assert the literal `content`/`blob` keys on the raw wire; the parsed view cannot distinguish a missing key from the client schema default - structured-content: only-structuredContent handler round-trips (client sees content []); requirement wording widened to match. Text-mirror requirement reworded as the handler-authored pattern it is — McpServer does not synthesize a mirror - server-side output-schema skip-on-error now covers an isError result carrying schema-violating structuredContent (cold call, before the client validator cache is primed) — passes through unvalidated instead of becoming an output-validation error - prompts/get with no arguments asserts the `arguments` key is absent from the wire request, not just defaulted away - resource template vars: exploded {var*} values arrive as an array under the bare key; percent-encoded segments route but are not decoded (pinned as-is) - metadata-override list entry overriding only description proves field-by-field merge (inherited mimeType survives) - subscribe capability sub-flag positively asserted through the InitializeResult merge in both the capability and subscribe-updated bodies - unit: low-level setRequestHandler before registerTool/Resource/Prompt throws (handler-conflict guard); transport-level errors and failed-to-send-response reach mcpServer.server.onerror — requirement notes narrowed/annotated to match what each tier locks - resources:annotations requirement no longer claims annotations on read contents entries (not a spec or SDK concept)

- elicitation handler registration is last-write-wins on a Client (replace before connect and mid-session); the replaced handler never runs - raw elicitation/create mode-bypass arm as an expected failure: the protocol-layer capability gate is mode-blind, so an undeclared-mode request still reaches the wire; exact server-gate message pinned in the helper arm - url-mode accept content pass-through pinned (no SDK-side enforcement) and the requirement wording re-scoped to what the SDK does - complete-unknown-ignored observes via fallbackNotificationHandler so a future built-in handler is not displaced by the test's own registration - url-required-error round-trips two pending elicitations and adds a resource-handler sibling (no tools/call catch wrapper on that path) - url-required-then-retry pins the raw wire frame: literal -32042 with error.data.elicitations array - url-at-session-init tees HTTP response bodies and proves the unsolicited elicitation arrived on the standalone GET stream and on no POST response - two concurrent OAuth providers against one server stay isolated end to end (client ids, state, verifiers, token exchanges, bearer on every request) - transport.close() during a pending connect() settles connect with ConnectionClosed and detaches the transport

…undaries, audience-isolation pattern - lifecycle: server→client ping cell (zero prior call sites of Server.ping in any test); asserts the client's built-in auto-pong answers with a literal {} on the wire - list-changed capability gating gets the role-flipped sibling: prompts advertised, TOOLS as the negative gate — the per-kind client gate clauses are separate code and tools was positive-only - output-schema: isError results carrying schema-violating structuredContent still reject client-side (pinned as the exact boundary of the isError skip; requirement wording re-scoped to the absent-payload skip the client really has), and additionalProperties:false violations reject (third sub-case, locks out validators that strip/ignore extras) - completion: ref/prompt completion with context.arguments asserted disjoint per context (prompt routing is separate from the resource arm); new cell pins the client-side 100-value CompleteResultSchema cap (101 values reject with the raw too_big validator error; 100 resolve) - logging: structured object data round-trips notifications/message verbatim (everything else in the suite is string data, so a String(data) coercion was invisible); no-logging-capability behavior pinned (sendLoggingMessage is a silent no-op, direct notification() throws, nothing crosses the wire); the current -32603 wrapping of invalid setLevel levels pinned positively with wire receipt proof — the spec -32602 cell is an expected failure and reason-blind to both halves - sampling pre-wire validation breadth: image, audio, and leading-position mix-ins with tool_result all reject before the wire, not just trailing text - protocol internal-error: raw-fetch JSON-RPC session on the HTTP hosts proves -32603 is server-authored on the wire, independent of the SDK client - resources: unknown-uri current code (-32602) pinned positively alongside the spec -32002 expected failure (two-direction ratchet, code only, no message pin); multi-entry ReadResourceResult contents (text+blob, order, verbatim) - hosting auth: aud-validation expected-failure body reshaped with a built-in positive control and exact challenge asserts so the flip is unambiguous; new verifier-enforced two-resource-server audience-isolation cell locks the documented user-level pattern that works today; getOAuthProtectedResource- MetadataUrl gets direct unit locks for root and path-suffixed URLs - tool name edges: dotted/namespaced and 128-char names register, list, and call verbatim on both server tiers

Consumers match on the leading "Maximum reconnection attempts" text of the terminal reconnection error, not on the full sentence, so the message head is the compatibility surface worth locking. Asserting the exact string also duplicated the retry default, which is already asserted separately by value in the default-options test. Matching the stable head keeps the consumer contract pinned while allowing the parenthesized count and trailing wording to change with the default.

test/packaging/exports.test.ts packs the real artifact (npm pack), installs it into a throwaway project without touching the network, and resolves every public entry point through the package export map from both ESM (import) and CJS (require), asserting each resolves and exposes its expected primary symbols. Covers the named export groups and the deep subpath imports that currently resolve only via the './*' wildcard; no in-repo test exercised the export map or the dist trees before. Runs as `npm run test:packaging`; excluded from plain `npm test` because it rebuilds and packs the package first.

changeset-bot · 2026-06-09T06:54:01Z

⚠️ No Changeset found

Latest commit: 6b9f457

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

pkg-pr-new · 2026-06-09T06:54:32Z

Open in StackBlitz

npm i https://pkg.pr.new/@modelcontextprotocol/sdk@2262

commit: 6b9f457

felixweinberger added 18 commits June 5, 2026 02:12

test: repair vacuous cells, lock server-side sampling validation, exp…

59cd10d

…and known-failure arms

test: lock session-id rotation, header precedence, scope step-up repl…

3a159c5

…acement, and reconnect timing in the Streamable HTTP client

docs: update e2e harness notes to match the current runner

d4b15c0

docs: format e2e contributor notes with prettier

1f205a6

felixweinberger requested a review from a team as a code owner June 9, 2026 06:53

Merge branch 'v1.x-2026-07-28' into fweinberger/p5-hardening

6b9f457

felixweinberger closed this Jun 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: strengthen e2e behavioral coverage of the 2025-11-25 surface#2262

test: strengthen e2e behavioral coverage of the 2025-11-25 surface#2262
felixweinberger wants to merge 19 commits into
v1.x-2026-07-28from
fweinberger/p5-hardening

felixweinberger commented Jun 9, 2026

Uh oh!

changeset-bot Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

pkg-pr-new Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

felixweinberger commented Jun 9, 2026

Motivation and Context

How Has This Been Tested?

Breaking Changes

Types of changes

Checklist

Additional context

Uh oh!

changeset-bot Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

pkg-pr-new Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

changeset-bot Bot commented Jun 9, 2026 •

edited

Loading

pkg-pr-new Bot commented Jun 9, 2026 •

edited

Loading