Skip to content

test: strengthen e2e behavioral coverage of the 2025-11-25 surface#2262

Closed
felixweinberger wants to merge 19 commits into
v1.x-2026-07-28from
fweinberger/p5-hardening
Closed

test: strengthen e2e behavioral coverage of the 2025-11-25 surface#2262
felixweinberger wants to merge 19 commits into
v1.x-2026-07-28from
fweinberger/p5-hardening

Conversation

@felixweinberger

Copy link
Copy Markdown
Contributor

Strengthens the e2e suite's lock on observable SDK behavior — error codes and messages, version negotiation, schema strictness boundaries, runtime defaults, and transport/OAuth/server/elicitation flows — so that internal changes on this integration branch that alter the 2025-11-25 behavior surface fail CI instead of landing silently.

Motivation and Context

The upcoming spec-revision work on this branch will churn SDK internals while existing behavior must stay intact. Reviewing the suite against the public behavior surface showed real gaps: several behavior-altering changes (error-code remaps, schema strictness flips, version-negotiation changes, default changes) passed the full test suite. This PR closes those gaps:

  • Pins error-code/message literals that consumers observably depend on
  • Threads the spec version through the e2e harness and asserts the negotiated version per cell, so cells labeled 2025-11-25 keep testing 2025-11-25 negotiation when newer versions are added
  • Locks schema strict/loose boundaries (result passthrough, strict empty-result acks, params stripping, envelope strictness) and raw result-parse error identity
  • Pins runtime defaults (request timeout, reconnection, stderr) and adds first-ever coverage for fallbackRequestHandler and repeated connect() on one server instance
  • Broadens transport, OAuth, server, and elicitation scenario coverage; repairs tests that had been narrowed around known bugs and records those as known failures
  • Adds package export-map resolution checks (ESM + CJS, deep subpaths) and updates the e2e contributor notes to match the current runner

How Has This Been Tested?

Full suite at the branch tip: e2e 1268/1268 (~22s), remaining tests 1663/1663 (including the new packaging checks, 39/39), typecheck and lint clean. The suite is deterministic across repeated full runs. Deliberately re-introducing representative behavior changes (error-code remaps, strictness flips, version bumps, default changes) turns the suite red; breaking an export-map entry fails the packaging checks.

Breaking Changes

None — tests, contributor notes, and two package.json script entries only. No src/ changes.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Test coverage improvement (non-breaking)

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

18 test-only commits, each scoped to one concern (reviewable commit-by-commit). Sets the behavioral baseline for this integration branch: subsequent work here is expected to keep this suite green for the 2025-11-25 protocol surface.

Downstream code matches SDK errors by literal code values, error.name,
and message substrings rather than enum members or instanceof (which
breaks across package boundaries). None of these were pinned, so a
renumber or reword would pass the entire suite silently. Lock them in:

- ErrorCode: exact-equality table of all 8 members' numeric values
  (also locks membership in both directions)
- McpError: error.name === 'McpError' set explicitly in the constructor
- close rejection: literal code -32000 plus a message containing
  'Connection closed', not just the enum member
- streamable HTTP GET failure: StreamableHTTPError with the HTTP status
  as .code and the 'Failed to open SSE stream' message marker
- per-session hosting: the 'No valid session ID' rejection surfaces
  through the client error with .code 400 and the body text embedded
- OAuth discovery failures: 'trying to load' + 'metadata' substrings
- startAuthorization/registerClient: anchored 'Incompatible auth
  server' message head
- parseErrorResponse fallback: anchored 'HTTP <status>' message head
The mutual-assignability checks tolerate renames or removals of optional
members on passthrough result types: the old key is absorbed by the
catchall index signature and the renamed key is optional, so assignment
still compiles in both directions. Renaming CallToolResult's isError to
isFailure passed typecheck and every existing test.

Add per-type key lists (checked against keyof with index signatures
stripped) for the result members consumers read by name (isError,
structuredContent, _meta, nextCursor, instructions, capability flags),
plus a runtime check that CallToolResultSchema.parse preserves isError.
Pin the accept/strip/reject line each Zod schema draws at the wire:

- EmptyResult acks are strict: an extra non-_meta key on a ping ack
  rejects client-side with the raw validator error
- JSON-RPC envelopes are strict: an unknown top-level sibling field is
  rejected at the deserialization boundary (HTTP 400/-32700; stdio
  onerror + drop + connection survives)
- Typed request params strip unknown siblings instead of rejecting them
- Typed results pass unknown top-level siblings through to the caller
  (tools/call and completion/complete)
- A result failing the consumer-supplied schema rejects with the raw
  issues-bearing validator error, not a wrapped McpError, for both
  request() and connect()
Every e2e cell is labeled with a spec version, but until now nothing tied
the label to the wire: the client always requests LATEST_PROTOCOL_VERSION,
so a version bump silently re-points every cell at the new negotiation
with no red anywhere in test/e2e.

- wire() now accepts the cell's TestArgs in place of a bare transport
  name and, after connect, asserts the handshake actually ran at the
  cell's labeled version - both the version the client requested in
  initialize and the version it accepted from the result (captured via
  setProtocolVersion, the one transport-agnostic seam that carries it).
  All scenario bodies hand their cell args through; tests that
  deliberately negotiate a different version (downgrade, fallback,
  reject) keep the bare-transport form and say so.
- lifecycle:version:match now runs against the SDK's own initialize
  handler and asserts the response version equals the requested version
  off the wire - a server that downgraded no longer passes.
- New guards: '2025-11-25' must stay in SUPPORTED_PROTOCOL_VERSIONS
  independent of LATEST_PROTOCOL_VERSION (the supported list names the
  latest version only by reference, so a naive bump silently drops the
  previous version), and LATEST_PROTOCOL_VERSION must appear on the e2e
  version axis so a bump is red in this suite until the matrix is
  consciously extended.
Defaults were only ever exercised through explicit options, so changing
any of them passed the whole suite. Lock each one at its observable
seam:

- Protocol: a request without a timeout option must still be pending one
  tick before 60s and reject with RequestTimeout (data.timeout 60000)
  exactly at it; DEFAULT_REQUEST_TIMEOUT_MSEC pinned by value.
- Protocol: fallbackRequestHandler now has dispatch coverage - an
  unhandled method routes to it instead of method-not-found, a
  registered handler still wins, and without a fallback the
  method-not-found error is returned (first tests of this seam).
- McpServer: one instance connected to a second transport after the
  first closes keeps its tool table and serves the new connection (the
  per-connection hosting pattern); a second connect while the first is
  open rejects without corrupting the live connection.
- StreamableHTTP client: the default reconnection options are asserted
  by value (initialDelay 1000, cap 30000, growFactor 1.5, maxRetries 2),
  including the backoff schedule they produce and the terminal
  'Maximum reconnection attempts (2) exceeded.' at the default budget.
- Hosting: a post-initialize request without an mcp-protocol-version
  header is treated as DEFAULT_NEGOTIATED_PROTOCOL_VERSION ('2025-03-26',
  pinned by value): no priming event on an eventStore host, identical to
  sending the default explicitly, with a primed positive control.
…r envelope shape

- timeout: after a progress-triggered reset, the re-armed timer must
  actually fire when the interval elapses with no further progress
  (reset-on-progress only proved the original deadline stopped applying)
- progress: two concurrent requests on one client each receive exactly
  their own updates — broadcast dispatch to all handlers now fails
- request ids: pin the dense zero-based sequence (initialize takes id 0,
  each later request id is the previous plus one), not just uniqueness
- cancellation: notifications/cancelled for requestId 0 should abort the
  id-0 handler; the falsy guard in _oncancel drops it today, recorded as
  a knownFailure so a silent fix or regression flips the manifest
- cancellation: a completed request's real id is also silently ignored
  (previously only never-issued ids were exercised)
- errors: an error envelope for a handler error without data omits the
  data member entirely; the client-side rejection has data === undefined
…acement, and reconnect timing in the Streamable HTTP client
…on, session routing, and stateless guards

- content-type: every non-JSON type 415s, including text/json and a
  missing header; a charset-parameterized application/json is accepted
- disconnect-not-cancel: new resumption arm — with an eventStore, the
  result of a dropped POST is delivered on the GET + Last-Event-ID
  replay stream after the handler completes
- dns-rebinding: a request with NO Host header is rejected while
  allowedHosts is configured; DELETE honors the Origin check too
- method-405: HEAD and OPTIONS join PUT/PATCH in the unsupported loop
- notifications-202: result-form and error-form JSON-RPC responses also
  return 202 with an empty body
- onerror: four more rejection branches (invalid envelope, unsupported
  protocol version, re-initialize, second GET) each fire exactly one
  error callback, asserted by count without pinning message text
- parse-error-400: single-fault invalid-envelope payloads (no method,
  object id, wrong jsonrpc version) each 400 with code -32700
- protocol-version-400: the GET and DELETE validation paths 400 like
  the POST path
- sse-close-after-response: probe emits progress frames first, so the
  stream must close because the response was written, response last
- standalone-sse-no-response: an orphaned response (POST connection
  dropped mid-call) never appears on the standalone GET stream
- send-no-listener: the unrouted notification is dropped, not buffered
  for a later GET stream
- session reuse: per-session counter plus a decoy session prove routing
  is by Mcp-Session-Id header, with state preserved
- session isolation: close-oldest direction — deleting the first
  session leaves a newer one serving
- reinitialize: rejected with and without the session header (driving
  the transport directly), -32600 body, original session stays live
- stateless no-reuse: the second-use guard also rejects a POSTed
  notification and a GET, not just request-bearing POSTs
- stateless no-session-id: a bogus Mcp-Session-Id on a non-initialize
  request is served normally — no session validation in stateless mode
- stateless restrictions: split per server-initiated request kind
  (sampling/elicitation/roots); sampling and roots hang today and stay
  knownFailures, elicitation already fails fast via its capability gate
  and is locked as a positive cell
…ce timing, env handling) and SSE fetch precedence

- value-import and exercise deserializeMessage so the export cannot be
  silently dropped from shared/stdio
- subclass StdioClientTransport with an overridden start() and run a real
  round-trip, locking subclassability
- pin both close() escalation grace periods (~2s stdin-EOF grace before
  SIGTERM, ~2s more before SIGKILL) against children that ignore stdin EOF
  and SIGTERM
- pin getDefaultEnvironment membership against a frozen copy of the
  safelist (instead of comparing it to its own output) plus the
  shell-function value guard
- e2e: spawn with no stderr option and assert the inherit default
  (transport.stderr null, JSON-RPC channel undisturbed)
- e2e: a consumer env entry colliding with a safelist key wins the merge;
  uncollided defaults still reach the child (new env-value fixture tool)
- SSE: when both eventSourceInit.fetch and opts.fetch are set, the
  stream-open GET uses eventSourceInit.fetch and POSTs use opts.fetch
…, scope-selection tiers, PKCE refusal arms, standalone-GET 401 handling

- mock AS now serves endpoints at the paths its metadata advertises, so DCR and
  the low-level helpers are proven to use discovered endpoints (non-default
  registration/authorize/token paths 404 a hardcoding client)
- scope selection runs against a live PRM document in every arm: challenge scope
  beats a real scopes_supported, multi-element fallback joins all scopes, omit
  only when no source supplies one; new cell pins the clientMetadata.scope
  fallback on both the authorize URL and the DCR body
- refresh /token body on the 403 step-up pins the exact param set (no scope
  param — mechanism of the known refresh-scope gap, paired with the existing
  knownFailure arm)
- PKCE refusal looped over plain/empty/lowercase lists; absent-field arm added
  as a knownFailure (spec requires refusal, SDK proceeds)
- standalone SSE GET 401: redirect outcome surfaces via onerror without retry
  (green); missing circuit breaker on the refresh path documented as a
  knownFailure (every successful refresh re-opens the GET)
- invalid_grant on the authorization-code exchange invalidates only tokens;
  server_error during refresh falls back to interactive auth without
  invalidating credentials; transparent refresh skips re-registration
…skip-on-error and override arms

- new positive cell pins McpServer single-page list completeness: 25 items per
  kind come back in one page with no nextCursor, and an unknown cursor leaves
  the result unchanged (the pagination knownFailures only ratchet the other
  direction — they would stay green if list results were truncated)
- tools/call text result and resources/read blob result now assert the literal
  `content`/`blob` keys on the raw wire; the parsed view cannot distinguish a
  missing key from the client schema default
- structured-content: only-structuredContent handler round-trips (client sees
  content []); requirement wording widened to match. Text-mirror requirement
  reworded as the handler-authored pattern it is — McpServer does not
  synthesize a mirror
- server-side output-schema skip-on-error now covers an isError result carrying
  schema-violating structuredContent (cold call, before the client validator
  cache is primed) — passes through unvalidated instead of becoming an
  output-validation error
- prompts/get with no arguments asserts the `arguments` key is absent from the
  wire request, not just defaulted away
- resource template vars: exploded {var*} values arrive as an array under the
  bare key; percent-encoded segments route but are not decoded (pinned as-is)
- metadata-override list entry overriding only description proves field-by-field
  merge (inherited mimeType survives)
- subscribe capability sub-flag positively asserted through the
  InitializeResult merge in both the capability and subscribe-updated bodies
- unit: low-level setRequestHandler before registerTool/Resource/Prompt throws
  (handler-conflict guard); transport-level errors and failed-to-send-response
  reach mcpServer.server.onerror — requirement notes narrowed/annotated to
  match what each tier locks
- resources:annotations requirement no longer claims annotations on read
  contents entries (not a spec or SDK concept)
- elicitation handler registration is last-write-wins on a Client (replace
  before connect and mid-session); the replaced handler never runs
- raw elicitation/create mode-bypass arm as an expected failure: the
  protocol-layer capability gate is mode-blind, so an undeclared-mode request
  still reaches the wire; exact server-gate message pinned in the helper arm
- url-mode accept content pass-through pinned (no SDK-side enforcement) and
  the requirement wording re-scoped to what the SDK does
- complete-unknown-ignored observes via fallbackNotificationHandler so a
  future built-in handler is not displaced by the test's own registration
- url-required-error round-trips two pending elicitations and adds a
  resource-handler sibling (no tools/call catch wrapper on that path)
- url-required-then-retry pins the raw wire frame: literal -32042 with
  error.data.elicitations array
- url-at-session-init tees HTTP response bodies and proves the unsolicited
  elicitation arrived on the standalone GET stream and on no POST response
- two concurrent OAuth providers against one server stay isolated end to end
  (client ids, state, verifiers, token exchanges, bearer on every request)
- transport.close() during a pending connect() settles connect with
  ConnectionClosed and detaches the transport
…undaries, audience-isolation pattern

- lifecycle: server→client ping cell (zero prior call sites of Server.ping in
  any test); asserts the client's built-in auto-pong answers with a literal {}
  on the wire
- list-changed capability gating gets the role-flipped sibling: prompts
  advertised, TOOLS as the negative gate — the per-kind client gate clauses are
  separate code and tools was positive-only
- output-schema: isError results carrying schema-violating structuredContent
  still reject client-side (pinned as the exact boundary of the isError skip;
  requirement wording re-scoped to the absent-payload skip the client really
  has), and additionalProperties:false violations reject (third sub-case, locks
  out validators that strip/ignore extras)
- completion: ref/prompt completion with context.arguments asserted disjoint
  per context (prompt routing is separate from the resource arm); new cell pins
  the client-side 100-value CompleteResultSchema cap (101 values reject with
  the raw too_big validator error; 100 resolve)
- logging: structured object data round-trips notifications/message verbatim
  (everything else in the suite is string data, so a String(data) coercion was
  invisible); no-logging-capability behavior pinned (sendLoggingMessage is a
  silent no-op, direct notification() throws, nothing crosses the wire); the
  current -32603 wrapping of invalid setLevel levels pinned positively with
  wire receipt proof — the spec -32602 cell is an expected failure and
  reason-blind to both halves
- sampling pre-wire validation breadth: image, audio, and leading-position
  mix-ins with tool_result all reject before the wire, not just trailing text
- protocol internal-error: raw-fetch JSON-RPC session on the HTTP hosts proves
  -32603 is server-authored on the wire, independent of the SDK client
- resources: unknown-uri current code (-32602) pinned positively alongside the
  spec -32002 expected failure (two-direction ratchet, code only, no message
  pin); multi-entry ReadResourceResult contents (text+blob, order, verbatim)
- hosting auth: aud-validation expected-failure body reshaped with a built-in
  positive control and exact challenge asserts so the flip is unambiguous;
  new verifier-enforced two-resource-server audience-isolation cell locks the
  documented user-level pattern that works today; getOAuthProtectedResource-
  MetadataUrl gets direct unit locks for root and path-suffixed URLs
- tool name edges: dotted/namespaced and 128-char names register, list, and
  call verbatim on both server tiers
Consumers match on the leading "Maximum reconnection attempts" text of
the terminal reconnection error, not on the full sentence, so the
message head is the compatibility surface worth locking. Asserting the
exact string also duplicated the retry default, which is already
asserted separately by value in the default-options test. Matching the
stable head keeps the consumer contract pinned while allowing the
parenthesized count and trailing wording to change with the default.
test/packaging/exports.test.ts packs the real artifact (npm pack),
installs it into a throwaway project without touching the network, and
resolves every public entry point through the package export map from
both ESM (import) and CJS (require), asserting each resolves and
exposes its expected primary symbols. Covers the named export groups
and the deep subpath imports that currently resolve only via the './*'
wildcard; no in-repo test exercised the export map or the dist trees
before.

Runs as `npm run test:packaging`; excluded from plain `npm test`
because it rebuilds and packs the package first.
@felixweinberger felixweinberger requested a review from a team as a code owner June 9, 2026 06:53
@changeset-bot

changeset-bot Bot commented Jun 9, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: 6b9f457

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@pkg-pr-new

pkg-pr-new Bot commented Jun 9, 2026

Copy link
Copy Markdown

Open in StackBlitz

npm i https://pkg.pr.new/@modelcontextprotocol/sdk@2262

commit: 6b9f457

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant