Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 7 additions & 9 deletions .agents/skills/e2e-tests/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,35 +31,33 @@ Run workspace scripts from the repo root when you want the standard e2e entrypoi

```bash
pnpm run test:e2e
pnpm run test:e2e:hermetic # only run tests that don't rely on external services or llm providers
pnpm run test:e2e:update # updates snapshots
pnpm run test:e2e:record # re-record provider cassettes (overwrites existing cassettes)
pnpm run test:e2e:record # re-record provider cassettes and update snapshots
```

Try not to use specific test narrowing commands unless hunting down a very nasty and specific bug.

## Cassettes

Cassettes mock provider HTTP responses (OpenAI, Anthropic, ...) so scenarios that opt in run hermetically in CI without provider keys.
Cassettes mock provider HTTP responses (OpenAI, Anthropic, ...) so external-provider scenarios replay in CI without provider keys. The harness starts a local `@braintrust/seinfeld` cassette server and points provider SDK base URL env vars at that server, so subprocesses and SDK-launched binaries are covered too.

- A scenario opts in by passing `runContext: { cassette: true, variantKey: "...", originalScenarioDir }` to `runScenarioDir`/`runNodeScenarioDir`. Cassettes live at `e2e/scenarios/<name>/__cassettes__/<variantKey>.json` (parallel to `__snapshots__/`).
- External-provider tests should thread `runContext: { variantKey: "...", originalScenarioDir }` into the scenario runner. In normal replay mode, missing cassette entries fail loudly instead of skipping or falling back to live providers. In `record` / `record-missing` mode, they run so new cassettes can be authored.
- Thread `runContext: { variantKey: "...", originalScenarioDir }` into `runScenarioDir`/`runNodeScenarioDir`. Cassettes live at `e2e/scenarios/<name>/__cassettes__/<variantKey>.cassette.json` (parallel to `__snapshots__/`). Only set `runContext.cassette` explicitly for unusual cases, such as `cassette: false` on a non-provider mode inside an otherwise provider-backed scenario.
- To re-record after changing a scenario:

```bash
ANTHROPIC_API_KEY=... OPENAI_API_KEY=... \
pnpm --filter=@braintrust/js-e2e-tests run test:e2e:record scenarios/<name>/scenario.test.ts
pnpm --filter=@braintrust/js-e2e-tests run test:e2e:record -- scenarios/<name>/scenario.test.ts
```

Then run again in `BRAINTRUST_E2E_CASSETTE_MODE=replay` with no provider keys to confirm the cassette is sufficient.

- Volatile fields in request bodies (e.g. AI-SDK `experimental_generateMessageId`) need a per-scenario filter. Add the scenario name and a `FilterSpec` to `e2e/helpers/cassette-filters.mjs`. The cassette layer is backed by `@braintrust/seinfeld` (`dev-packages/seinfeld`); the preload entry point is `e2e/helpers/cassette-preload.mjs`.
- Volatile fields in request bodies (e.g. AI-SDK `experimental_generateMessageId`) need a per-scenario filter. Add or update `<scenario-dir>/cassette-filter.mjs` with a named `filter` export. The cassette server loads that filter through `withScenarioHarness(...)`.
- **After recording, always inspect the cassette for leaked API keys before committing.** The recorder redacts common patterns (`paranoid` preset), but confirm that no real keys appear in request headers, response bodies, or any URL query parameters. If a key slips through, remove the cassette, add a custom `redact` rule, and re-record.

## Preferred Patterns

- Keep the expensive setup at module scope with `prepareScenarioDir(...)`. Only call `installScenarioDependencies(...)` directly when you are testing installer behavior or need a nonstandard setup.
- Run every scenario through `withScenarioHarness(...)`.
- Tag every test with exactly one tag from `e2e/helpers/tags.ts`.
- Keep reusable logic in `e2e/helpers/`. Keep one-off fixtures and scenario-specific files inside the scenario directory.
- Snapshot stable contracts, not raw noise. Use `normalizeForSnapshot(...)` before inline snapshots and `formatJsonFileSnapshot(...)` plus file snapshots for larger payloads or version matrices.
- When a scenario family already has `assertions.ts`, keep version- or provider-specific test setup in `scenario.test.ts` and reuse the shared assertions file.
Expand Down Expand Up @@ -91,5 +89,5 @@ pnpm install --dir e2e/scenarios/<name> --ignore-workspace --lockfile-only --str

- Flaky snapshot: normalize the changing field instead of snapshotting around it.
- Request-flow assertions: grab `requestCursor()` before running the scenario, then inspect `requestsAfter(...)`.
- If the scenario is external-provider backed, confirm the required provider env var is set before debugging the assertions.
- If the scenario is external-provider backed, first confirm it has a cassette or run in `BRAINTRUST_E2E_CASSETTE_MODE=record` with the required provider env var set.
- Deno/browser scenarios may intentionally differ from Node. Assert the real runtime contract instead of copying Node expectations blindly.
10 changes: 5 additions & 5 deletions .github/workflows/checks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ jobs:
path: js/artifacts/*.tgz
retention-days: 1

e2e-hermetic:
e2e:
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
Expand All @@ -158,8 +158,8 @@ jobs:
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v5.0.0
- name: Install dependencies
run: pnpm install --frozen-lockfile
- name: Run hermetic e2e tests
run: pnpm run test:e2e:hermetic
- name: Run e2e tests
run: pnpm run test:e2e

js-api-compatibility:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -458,7 +458,7 @@ jobs:
- ensure-pinned-actions
- js-test
- js-build
- e2e-hermetic
- e2e
- js-smoke-discover
- js-smoke-test
- temporal-js
Expand Down Expand Up @@ -490,7 +490,7 @@ jobs:
check_result "ensure-pinned-actions" "${{ needs.ensure-pinned-actions.result }}"
check_result "js-test" "${{ needs.js-test.result }}"
check_result "js-build" "${{ needs.js-build.result }}"
check_result "e2e-hermetic" "${{ needs.e2e-hermetic.result }}"
check_result "e2e" "${{ needs.e2e.result }}"
check_result "js-smoke-discover" "${{ needs.js-smoke-discover.result }}"
check_result "js-smoke-test" "${{ needs.js-smoke-test.result }}"
check_result "temporal-js" "${{ needs.temporal-js.result }}"
Expand Down
85 changes: 0 additions & 85 deletions .github/workflows/e2e-canary.yaml

This file was deleted.

83 changes: 0 additions & 83 deletions .github/workflows/integration-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ on:
pull_request:
paths:
- "js/**"
- "e2e/**"
- ".github/workflows/integration-tests.yaml"
- "package.json"
- "pnpm-lock.yaml"
Expand Down Expand Up @@ -49,85 +48,3 @@ jobs:
working-directory: js
shell: bash
run: pnpm run test:external

e2e:
runs-on: ubuntu-latest
timeout-minutes: 45
env:
BRAINTRUST_API_KEY: ${{ secrets.BRAINTRUST_API_KEY }}
BRAINTRUST_E2E_PROJECT_NAME: ${{ vars.BRAINTRUST_E2E_PROJECT_NAME }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }}
COPILOT_API_KEY: ${{ secrets.COPILOT_API_KEY }}
CURSOR_API_KEY: ${{ secrets.CURSOR_API_KEY }}
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
HUGGINGFACE_API_KEY: ${{ secrets.HUGGINGFACE_API_KEY }}
steps:
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
- uses: denoland/setup-deno@ff4860f9d7236f320afa0f82b7e6457384805d05 # v2.0.4
with:
deno-version-file: .tool-versions
- uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4.4.0
with:
node-version-file: .tool-versions
- name: Setup pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v5.0.0
- name: Install dependencies
run: pnpm install --frozen-lockfile
- name: Prepare e2e run context directory
id: run_context
shell: bash
run: |
RUN_CONTEXT_DIR="$(mktemp -d)"
echo "dir=$RUN_CONTEXT_DIR" >> "$GITHUB_OUTPUT"
- name: Run e2e tests
env:
BRAINTRUST_E2E_RUN_CONTEXT_DIR: ${{ steps.run_context.outputs.dir }}
run: pnpm test:e2e
- name: Build e2e Braintrust links summary
if: ${{ always() }}
shell: bash
env:
BRAINTRUST_E2E_RUN_CONTEXT_DIR: ${{ steps.run_context.outputs.dir }}
BRAINTRUST_ORG_NAME: ${{ vars.BRAINTRUST_ORG_NAME }}
GITHUB_HEAD_REF: ${{ github.head_ref }}
GITHUB_REF_NAME: ${{ github.ref_name }}
GITHUB_REPOSITORY: ${{ github.repository }}
GITHUB_RUN_ID: ${{ github.run_id }}
GITHUB_SERVER_URL: ${{ github.server_url }}
run: |
SUMMARY_PATH="$RUNNER_TEMP/e2e-braintrust-links-summary.md"
node e2e/scripts/build-pr-e2e-links-comment.mjs --output "$SUMMARY_PATH"
cat "$SUMMARY_PATH" >> "$GITHUB_STEP_SUMMARY"

# e2e-canary:
# runs-on: ubuntu-latest
# timeout-minutes: 45
# env:
# BRAINTRUST_API_KEY: ${{ secrets.BRAINTRUST_API_KEY }}
# BRAINTRUST_E2E_PROJECT_NAME: ${{ vars.BRAINTRUST_E2E_PROJECT_NAME }}
# ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
# GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
# COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }}
# COPILOT_API_KEY: ${{ secrets.COPILOT_API_KEY }}
# CURSOR_API_KEY: ${{ secrets.CURSOR_API_KEY }}
# GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
# OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
# OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
# MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
# HUGGINGFACE_API_KEY: ${{ secrets.HUGGINGFACE_API_KEY }}
# steps:
# - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
# - uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4.4.0
# with:
# node-version-file: .tool-versions
# - name: Setup pnpm
# uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v5.0.0
# - name: Install dependencies
# run: pnpm install --frozen-lockfile
# - name: Run canary e2e tests
# run: pnpm test:e2e:canary
4 changes: 2 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,11 +55,11 @@ export ANTHROPIC_API_KEY="sk-ant-..."

**E2E tests (`e2e/`):**

Each scenario runs the SDK in a subprocess against a mock Braintrust server and snapshots the results. No API keys required.
Each scenario runs the SDK in a subprocess against a mock Braintrust server and snapshots the results. No API keys required for replay; recording needs provider keys.

```bash
pnpm run test:e2e # Run all e2e scenarios (from repo root)
pnpm run test:e2e:update # Run and update snapshots
pnpm run test:e2e:record # Re-record provider cassettes and update snapshots
```

**From repo root:**
Expand Down
Loading
Loading