From 34b1c00e26a997e40a219ffeef1985613844713c Mon Sep 17 00:00:00 2001 From: Christopher Tso Date: Wed, 7 Jan 2026 15:53:40 +1100 Subject: [PATCH 01/14] openspec: add copilot-cli provider proposal --- .../add-copilot-cli-provider/design.md | 87 +++++++++++++++++++ .../add-copilot-cli-provider/proposal.md | 36 ++++++++ .../specs/evaluation/spec.md | 12 +++ .../specs/validation/spec.md | 14 +++ .../changes/add-copilot-cli-provider/tasks.md | 24 +++++ 5 files changed, 173 insertions(+) create mode 100644 openspec/changes/add-copilot-cli-provider/design.md create mode 100644 openspec/changes/add-copilot-cli-provider/proposal.md create mode 100644 openspec/changes/add-copilot-cli-provider/specs/evaluation/spec.md create mode 100644 openspec/changes/add-copilot-cli-provider/specs/validation/spec.md create mode 100644 openspec/changes/add-copilot-cli-provider/tasks.md diff --git a/openspec/changes/add-copilot-cli-provider/design.md b/openspec/changes/add-copilot-cli-provider/design.md new file mode 100644 index 00000000..136d80f8 --- /dev/null +++ b/openspec/changes/add-copilot-cli-provider/design.md @@ -0,0 +1,87 @@ +## Context + +AgentV supports multiple provider kinds: +- Cloud LLM providers (Azure OpenAI, Anthropic, Gemini) +- Agent-style providers that operate on a workspace (Codex CLI, VS Code Copilot, Claude Code, Pi) + +GitHub Copilot provides a CLI package (`@github/copilot`) that can be invoked via `npx` and interacted with through stdin/stdout. AgentV can adopt this pattern for evaluation runs. + +## Goals +- Add a built-in provider kind that runs GitHub Copilot CLI (`@github/copilot`) as an external process. +- Keep configuration minimal and consistent with existing CLI-style providers (especially `codex`). +- Ensure deterministic capture of the “candidate answer” with good error messages and artifacts. + +## Proposed Provider Identity +- Canonical kind: `copilot-cli` +- Accepted aliases (to reduce user friction): `copilot`, `github-copilot` + +Rationale: `copilot` alone is ambiguous with the VS Code Copilot provider; `copilot-cli` makes intent explicit. + +## Invocation Strategy + +### Base command +Default to invoking Copilot via npm: +- `npx -y @github/copilot@` + +Rationale: +- Avoid requiring a global install. +- Match vibe-kanban’s approach. +- Pin a version to reduce behavior drift across runs. + +### Process I/O +- Write the rendered prompt to stdin, then close stdin. +- Capture stdout/stderr. + +### Log directory +- Pass `--log-dir ` and `--log-level debug` when supported. +- Record the log directory path in the ProviderResponse metadata for debugging. + +### Timeout & cancellation +- Support a target-configured timeout (seconds or ms consistent with AgentV conventions). +- Abort via `AbortSignal` if provided by the orchestrator. + +## Prompt Construction +Copilot CLI runs in a workspace directory, so the provider should follow the same “agent provider preread” pattern used by `vscode` and `codex`: +- Include a preread section that links guideline and attachment files via `file://` URLs. +- Include the user query. + +## Response Extraction +Copilot CLI’s stdout is expected to contain a mixture of progress text and the final assistant message. + +Proposed minimal extraction algorithm: +- Strip ANSI escape sequences. +- Trim surrounding whitespace. +- Treat the remaining stdout content as the candidate answer. +- Preserve full stdout/stderr as artifacts on failures. + +If Copilot CLI later provides a stable, documented structured output mode, AgentV MAY add opt-in support in a future change. + +## Target Configuration Surface +Keep this comparable to `codex`: +- `provider: copilot-cli` +- `settings.executable` (optional): defaults to `npx` +- `settings.args` (optional): appended args; default includes `-y @github/copilot@` and flags +- `settings.cwd` (optional) +- `settings.timeoutSeconds` (optional) +- `settings.env` (optional) +- `settings.model` (optional) + +Avoid overfitting to every Copilot CLI flag initially; allow passthrough args for advanced use. + +## Security & Safety Notes +- Like other agent providers, Copilot CLI can read local files from the workspace. +- Any “allow all tools” behavior (if exposed) should be opt-in and clearly documented. +- Prefer defaulting to safer settings, consistent with existing providers. + +## Testing Strategy (implementation stage) +- Unit tests for: + - command argument construction + - stdout parsing/extraction + - timeout handling +- Integration-style tests (mock runner) that simulate Copilot CLI stdout/stderr. + +## Alternatives Considered +- Use `gh copilot`: + - Rejected: requested explicitly to use `@github/copilot` like vibe-kanban. +- Implement as `cli` provider template: + - Rejected: would push complexity to users and lose built-in prompt construction and artifacts. diff --git a/openspec/changes/add-copilot-cli-provider/proposal.md b/openspec/changes/add-copilot-cli-provider/proposal.md new file mode 100644 index 00000000..c0707fe8 --- /dev/null +++ b/openspec/changes/add-copilot-cli-provider/proposal.md @@ -0,0 +1,36 @@ +# Change: Add GitHub Copilot CLI provider + +## Why +AgentV currently supports running agent-style evaluations via `provider: vscode` (VS Code) and `provider: codex` (Codex CLI), but it does not support the GitHub Copilot CLI package (`@github/copilot`). Teams that standardize on Copilot CLI (often via `npx -y @github/copilot`) cannot evaluate the same prompts/tasks in AgentV without custom wrappers. + +This change adds a first-class `copilot-cli` provider so AgentV can invoke Copilot CLI directly and capture responses for evaluation. + +## What Changes +- Add a new target provider kind: `copilot-cli` (GitHub Copilot CLI via `@github/copilot`). +- Add target configuration fields for Copilot CLI execution (command/executable, args, model, timeout, cwd, env). +- Implement provider execution by spawning the Copilot CLI process, piping a constructed prompt to stdin, and capturing the final assistant response from stdout. +- Persist provider artifacts (stdout/stderr and optional log-dir files) for debugging on failures. +- Update documentation/templates so `agentv init` guidance includes Copilot CLI targets. + +## Non-Goals +- Do not add `gh copilot` (GitHub CLI subcommand) support in this change. +- Do not add interactive “resume session” UX; evaluations run as independent invocations. +- Do not introduce a new plugin system; this remains a built-in provider like `codex`/`vscode`. + +## Impact +- Affected specs: + - `evaluation` (new provider invocation behavior) + - `validation` (targets schema/validation for the new provider) +- Affected code (implementation stage): + - `packages/core/src/evaluation/providers/*` (new provider + provider registry) + - `packages/core/src/evaluation/validation/targets-validator.ts` + - `apps/cli` docs/templates (provider list + examples) + +## Compatibility +- Backwards compatible: existing targets continue to work unchanged. +- `provider: copilot-cli` is additive. + +## Decisions +- Canonical provider kind: `copilot-cli`. +- Accepted provider aliases: `copilot` and `github-copilot`. +- Output contract: unless/ until Copilot CLI exposes a stable machine-readable mode that AgentV supports, the provider treats Copilot CLI stdout as the candidate answer after stripping ANSI escapes and trimming surrounding whitespace. diff --git a/openspec/changes/add-copilot-cli-provider/specs/evaluation/spec.md b/openspec/changes/add-copilot-cli-provider/specs/evaluation/spec.md new file mode 100644 index 00000000..9e3e7e90 --- /dev/null +++ b/openspec/changes/add-copilot-cli-provider/specs/evaluation/spec.md @@ -0,0 +1,12 @@ +## MODIFIED Requirements + +### Requirement: Provider Integration +The system SHALL integrate with supported providers using target configuration and optional retry settings. + +#### Scenario: GitHub Copilot CLI provider +- **WHEN** a target uses `provider: copilot-cli` (or an accepted alias) +- **THEN** the system ensures the Copilot CLI launcher is available (defaulting to `npx` when not explicitly configured) +- **AND** builds a preread prompt document that links guideline and attachment files via `file://` URLs and includes the user query +- **AND** runs GitHub Copilot CLI via `@github/copilot` with a pinned version by default (configurable), piping the prompt via stdin +- **AND** captures stdout/stderr and extracts a single candidate answer text from the final assistant output +- **AND** on failure, the error includes exit code/timeout context and preserves stdout/stderr and any log artifacts for debugging diff --git a/openspec/changes/add-copilot-cli-provider/specs/validation/spec.md b/openspec/changes/add-copilot-cli-provider/specs/validation/spec.md new file mode 100644 index 00000000..2f632375 --- /dev/null +++ b/openspec/changes/add-copilot-cli-provider/specs/validation/spec.md @@ -0,0 +1,14 @@ +## MODIFIED Requirements + +### Requirement: Targets File Schema Validation +The system SHALL validate target configuration using Zod schemas that serve as both runtime validators and TypeScript type sources. + +#### Scenario: Unknown Copilot CLI provider property rejected +- **WHEN** a targets file contains a Copilot CLI target with an unrecognized property +- **THEN** the system SHALL reject the configuration with an error identifying the unknown property + +#### Scenario: Copilot CLI provider accepts snake_case and camelCase settings +- **WHEN** a targets file uses `provider: copilot-cli` (or an accepted alias) +- **AND** configures supported settings using either snake_case or camelCase +- **THEN** validation succeeds +- **AND** the resolved config normalizes to camelCase diff --git a/openspec/changes/add-copilot-cli-provider/tasks.md b/openspec/changes/add-copilot-cli-provider/tasks.md new file mode 100644 index 00000000..04ee61f3 --- /dev/null +++ b/openspec/changes/add-copilot-cli-provider/tasks.md @@ -0,0 +1,24 @@ +## 1. Provider + Targets +- [ ] 1.1 Add `copilot-cli` to `ProviderKind`, `KNOWN_PROVIDERS`, and `AGENT_PROVIDER_KINDS` (and decide aliases). +- [ ] 1.2 Extend target parsing to recognize `provider: copilot-cli` (and chosen aliases) and resolve a typed Copilot config. +- [ ] 1.3 Extend `targets-validator` to accept the Copilot settings keys and reject unknown properties with actionable errors. + +## 2. Execution +- [ ] 2.1 Implement `CopilotCliProvider` (mirroring patterns from `CodexProvider`): spawn process, write prompt to stdin, capture stdout/stderr, enforce timeout. +- [ ] 2.2 Implement prompt preread rendering consistent with other agent providers (file:// links for guidelines and attachments). +- [ ] 2.3 Implement robust stdout parsing to extract a single candidate answer; preserve raw artifacts on errors. +- [ ] 2.4 Register provider in provider factory/registry. + +## 3. Docs + Templates +- [ ] 3.1 Update CLI docs to list `copilot-cli` as a supported provider and add a minimal `targets.yaml` example. +- [ ] 3.2 Update `apps/cli/src/templates/.claude/skills/agentv-eval-builder/` references so `agentv init` users get Copilot CLI guidance. + +## 4. Tests +- [ ] 4.1 Add unit tests for config resolution and argument rendering. +- [ ] 4.2 Add provider tests using a mocked runner (no real Copilot CLI dependency) for success, invalid output, and timeout. + +## 5. Validation +- [ ] 5.1 Run `bun run build`, `bun run typecheck`, `bun run lint`, `bun test`. + +## 6. Release hygiene +- [ ] 6.1 Add a changeset if user-visible behavior changes should ship in the next release. From cb6e824f45dba60f6a44c8274afb87acad14f377 Mon Sep 17 00:00:00 2001 From: Christopher Tso Date: Sun, 1 Feb 2026 04:50:48 +0000 Subject: [PATCH 02/14] feat(core): register copilot-cli provider kind and aliases Co-Authored-By: Claude Opus 4.5 --- packages/core/src/evaluation/providers/types.ts | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/packages/core/src/evaluation/providers/types.ts b/packages/core/src/evaluation/providers/types.ts index 0d1fb170..0afec63e 100644 --- a/packages/core/src/evaluation/providers/types.ts +++ b/packages/core/src/evaluation/providers/types.ts @@ -15,6 +15,7 @@ export type ProviderKind = | 'anthropic' | 'gemini' | 'codex' + | 'copilot-cli' | 'pi-coding-agent' | 'pi-agent-sdk' | 'claude-code' @@ -29,6 +30,7 @@ export type ProviderKind = */ export const AGENT_PROVIDER_KINDS: readonly ProviderKind[] = [ 'codex', + 'copilot-cli', 'pi-coding-agent', 'claude-code', 'vscode', @@ -44,6 +46,7 @@ export const KNOWN_PROVIDERS: readonly ProviderKind[] = [ 'anthropic', 'gemini', 'codex', + 'copilot-cli', 'pi-coding-agent', 'pi-agent-sdk', 'claude-code', @@ -62,6 +65,8 @@ export const PROVIDER_ALIASES: readonly string[] = [ 'google', // alias for "gemini" 'google-gemini', // alias for "gemini" 'codex-cli', // alias for "codex" + 'copilot', // alias for "copilot-cli" + 'github-copilot', // alias for "copilot-cli" 'pi', // alias for "pi-coding-agent" 'openai', // legacy/future support 'bedrock', // legacy/future support @@ -253,6 +258,9 @@ export interface TargetDefinition { readonly logFormat?: string | unknown | undefined; readonly log_output_format?: string | unknown | undefined; readonly logOutputFormat?: string | unknown | undefined; + // System prompt (codex, copilot-cli, claude-code, pi-coding-agent) + readonly system_prompt?: string | unknown | undefined; + readonly systemPrompt?: string | unknown | undefined; // Mock fields readonly response?: string | unknown | undefined; readonly delayMs?: number | unknown | undefined; From f8003a3a82946ce00e4ab474eb7c43cf6694e81f Mon Sep 17 00:00:00 2001 From: Christopher Tso Date: Sun, 1 Feb 2026 04:51:22 +0000 Subject: [PATCH 03/14] feat(core): add copilot-cli target resolution and config Co-Authored-By: Claude Opus 4.5 --- .../core/src/evaluation/providers/targets.ts | 88 +++++++++++++++++++ 1 file changed, 88 insertions(+) diff --git a/packages/core/src/evaluation/providers/targets.ts b/packages/core/src/evaluation/providers/targets.ts index 0d2aec2d..2afd620b 100644 --- a/packages/core/src/evaluation/providers/targets.ts +++ b/packages/core/src/evaluation/providers/targets.ts @@ -434,6 +434,17 @@ export interface CodexResolvedConfig { readonly systemPrompt?: string; } +export interface CopilotResolvedConfig { + readonly executable: string; + readonly model?: string; + readonly args?: readonly string[]; + readonly cwd?: string; + readonly timeoutMs?: number; + readonly logDir?: string; + readonly logFormat?: 'summary' | 'json'; + readonly systemPrompt?: string; +} + export interface PiCodingAgentResolvedConfig { readonly executable: string; readonly provider?: string; @@ -525,6 +536,14 @@ export type ResolvedTarget = readonly providerBatching?: boolean; readonly config: CodexResolvedConfig; } + | { + readonly kind: 'copilot-cli'; + readonly name: string; + readonly judgeTarget?: string; + readonly workers?: number; + readonly providerBatching?: boolean; + readonly config: CopilotResolvedConfig; + } | { readonly kind: 'pi-coding-agent'; readonly name: string; @@ -693,6 +712,17 @@ export function resolveTargetDefinition( providerBatching, config: resolveCodexConfig(parsed, env), }; + case 'copilot-cli': + case 'copilot': + case 'github-copilot': + return { + kind: 'copilot-cli', + name: parsed.name, + judgeTarget: parsed.judge_target, + workers: parsed.workers, + providerBatching, + config: resolveCopilotConfig(parsed, env), + }; case 'pi': case 'pi-coding-agent': return { @@ -909,6 +939,64 @@ function normalizeCodexLogFormat(value: unknown): 'summary' | 'json' | undefined throw new Error("codex log format must be 'summary' or 'json'"); } +function resolveCopilotConfig( + target: z.infer, + env: EnvLookup, +): CopilotResolvedConfig { + const executableSource = target.executable ?? target.command ?? target.binary; + const modelSource = target.model; + const argsSource = target.args ?? target.arguments; + const cwdSource = target.cwd; + const timeoutSource = target.timeout_seconds ?? target.timeoutSeconds; + const logDirSource = + target.log_dir ?? target.logDir ?? target.log_directory ?? target.logDirectory; + const logFormatSource = + target.log_format ?? target.logFormat ?? target.log_output_format ?? target.logOutputFormat; + const systemPromptSource = target.system_prompt ?? target.systemPrompt; + + const executable = + resolveOptionalString(executableSource, env, `${target.name} copilot executable`, { + allowLiteral: true, + optionalEnv: true, + }) ?? 'copilot'; + + const model = resolveOptionalString(modelSource, env, `${target.name} copilot model`, { + allowLiteral: true, + optionalEnv: true, + }); + + const args = resolveOptionalStringArray(argsSource, env, `${target.name} copilot args`); + + const cwd = resolveOptionalString(cwdSource, env, `${target.name} copilot cwd`, { + allowLiteral: true, + optionalEnv: true, + }); + + const timeoutMs = resolveTimeoutMs(timeoutSource, `${target.name} copilot timeout`); + + const logDir = resolveOptionalString(logDirSource, env, `${target.name} copilot log directory`, { + allowLiteral: true, + optionalEnv: true, + }); + + const logFormat = normalizeCopilotLogFormat(logFormatSource); + + const systemPrompt = + typeof systemPromptSource === 'string' && systemPromptSource.trim().length > 0 + ? systemPromptSource.trim() + : undefined; + + return { executable, model, args, cwd, timeoutMs, logDir, logFormat, systemPrompt }; +} + +function normalizeCopilotLogFormat(value: unknown): 'summary' | 'json' | undefined { + if (value === undefined || value === null) return undefined; + if (typeof value !== 'string') throw new Error("copilot log format must be 'summary' or 'json'"); + const normalized = value.trim().toLowerCase(); + if (normalized === 'json' || normalized === 'summary') return normalized; + throw new Error("copilot log format must be 'summary' or 'json'"); +} + function resolvePiCodingAgentConfig( target: z.infer, env: EnvLookup, From 998a355dcde9bf9fb5d596e337a82b077af8f633 Mon Sep 17 00:00:00 2001 From: Christopher Tso Date: Sun, 1 Feb 2026 04:51:38 +0000 Subject: [PATCH 04/14] feat(core): add copilot-cli log tracker Co-Authored-By: Claude Opus 4.5 --- .../providers/copilot-cli-log-tracker.ts | 73 +++++++++++++++++++ 1 file changed, 73 insertions(+) create mode 100644 packages/core/src/evaluation/providers/copilot-cli-log-tracker.ts diff --git a/packages/core/src/evaluation/providers/copilot-cli-log-tracker.ts b/packages/core/src/evaluation/providers/copilot-cli-log-tracker.ts new file mode 100644 index 00000000..3c92d97d --- /dev/null +++ b/packages/core/src/evaluation/providers/copilot-cli-log-tracker.ts @@ -0,0 +1,73 @@ +export type CopilotCliLogEntry = { + readonly filePath: string; + readonly evalCaseId?: string; + readonly targetName: string; + readonly attempt?: number; +}; + +const GLOBAL_LOGS_KEY = Symbol.for('agentv.copilotCliLogs'); +const GLOBAL_SUBSCRIBERS_KEY = Symbol.for('agentv.copilotCliLogSubscribers'); + +type CopilotCliLogListener = (entry: CopilotCliLogEntry) => void; + +type GlobalWithCopilotCliLogs = typeof globalThis & { + [GLOBAL_LOGS_KEY]?: CopilotCliLogEntry[]; + [GLOBAL_SUBSCRIBERS_KEY]?: Set; +}; + +function getCopilotCliLogStore(): CopilotCliLogEntry[] { + const globalObject = globalThis as GlobalWithCopilotCliLogs; + const existing = globalObject[GLOBAL_LOGS_KEY]; + if (existing) { + return existing; + } + const created: CopilotCliLogEntry[] = []; + globalObject[GLOBAL_LOGS_KEY] = created; + return created; +} + +function getSubscriberStore(): Set { + const globalObject = globalThis as GlobalWithCopilotCliLogs; + const existing = globalObject[GLOBAL_SUBSCRIBERS_KEY]; + if (existing) { + return existing; + } + const created = new Set(); + globalObject[GLOBAL_SUBSCRIBERS_KEY] = created; + return created; +} + +function notifySubscribers(entry: CopilotCliLogEntry): void { + const subscribers = Array.from(getSubscriberStore()); + for (const listener of subscribers) { + try { + listener(entry); + } catch (error) { + const message = error instanceof Error ? error.message : String(error); + console.warn(`Copilot CLI log subscriber failed: ${message}`); + } + } +} + +export function recordCopilotCliLogEntry(entry: CopilotCliLogEntry): void { + getCopilotCliLogStore().push(entry); + notifySubscribers(entry); +} + +export function consumeCopilotCliLogEntries(): CopilotCliLogEntry[] { + const store = getCopilotCliLogStore(); + if (store.length === 0) { + return []; + } + return store.splice(0, store.length); +} + +export function subscribeToCopilotCliLogEntries( + listener: CopilotCliLogListener, +): () => void { + const store = getSubscriberStore(); + store.add(listener); + return () => { + store.delete(listener); + }; +} From 3fc11659451befeafd6fb98cf8d53f57b0c630b3 Mon Sep 17 00:00:00 2001 From: Christopher Tso Date: Sun, 1 Feb 2026 04:54:09 +0000 Subject: [PATCH 05/14] feat(core): implement CopilotCliProvider Co-Authored-By: Claude Opus 4.5 --- .../src/evaluation/providers/copilot-cli.ts | 551 ++++++++++++++++++ 1 file changed, 551 insertions(+) create mode 100644 packages/core/src/evaluation/providers/copilot-cli.ts diff --git a/packages/core/src/evaluation/providers/copilot-cli.ts b/packages/core/src/evaluation/providers/copilot-cli.ts new file mode 100644 index 00000000..fca36975 --- /dev/null +++ b/packages/core/src/evaluation/providers/copilot-cli.ts @@ -0,0 +1,551 @@ +import { exec as execCallback, spawn } from 'node:child_process'; +import { randomUUID } from 'node:crypto'; +import { constants, createWriteStream } from 'node:fs'; +import type { WriteStream } from 'node:fs'; +import { access, mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises'; +import { tmpdir } from 'node:os'; +import path from 'node:path'; +import { promisify } from 'node:util'; + +import { recordCopilotCliLogEntry } from './copilot-cli-log-tracker.js'; +import { buildPromptDocument, normalizeInputFiles } from './preread.js'; +import type { CopilotResolvedConfig } from './targets.js'; +import type { Provider, ProviderRequest, ProviderResponse } from './types.js'; + +const execAsync = promisify(execCallback); +const WORKSPACE_PREFIX = 'agentv-copilot-'; +const PROMPT_FILENAME = 'prompt.md'; + +/** + * Default system prompt for Copilot CLI evaluations. + * Ensures the agent returns code in its response rather than just writing files. + */ +const DEFAULT_SYSTEM_PROMPT = `**IMPORTANT**: Follow these instructions for your response: +- Do NOT create any additional output files in the workspace. +- All intended file outputs/changes MUST be written in your response. +- For each intended file, include the relative path and unified git diff following the convention \`diff --git ...\`. +This is required for evaluation scoring.`; + +interface CopilotCliRunOptions { + readonly executable: string; + readonly args: readonly string[]; + readonly cwd: string; + readonly prompt: string; + readonly timeoutMs?: number; + readonly env: NodeJS.ProcessEnv; + readonly signal?: AbortSignal; + readonly onStdoutChunk?: (chunk: string) => void; + readonly onStderrChunk?: (chunk: string) => void; +} + +interface CopilotCliRunResult { + readonly stdout: string; + readonly stderr: string; + readonly exitCode: number; + readonly timedOut?: boolean; +} + +type CopilotCliRunner = (options: CopilotCliRunOptions) => Promise; + +export class CopilotCliProvider implements Provider { + readonly id: string; + readonly kind = 'copilot-cli' as const; + readonly targetName: string; + readonly supportsBatch = false; + + private readonly config: CopilotResolvedConfig; + private readonly runCopilot: CopilotCliRunner; + private environmentCheck?: Promise; + private resolvedExecutable?: string; + + constructor( + targetName: string, + config: CopilotResolvedConfig, + runner: CopilotCliRunner = defaultCopilotCliRunner, + ) { + this.id = `copilot-cli:${targetName}`; + this.targetName = targetName; + this.config = config; + this.runCopilot = runner; + } + + async invoke(request: ProviderRequest): Promise { + if (request.signal?.aborted) { + throw new Error('Copilot CLI request was aborted before execution'); + } + + await this.ensureEnvironmentReady(); + + const inputFiles = normalizeInputFiles(request.inputFiles); + + const workspaceRoot = await this.createWorkspace(); + const logger = await this.createStreamLogger(request).catch(() => undefined); + try { + const basePrompt = buildPromptDocument(request, inputFiles); + const systemPrompt = this.config.systemPrompt ?? DEFAULT_SYSTEM_PROMPT; + const promptContent = `${systemPrompt}\n\n${basePrompt}`; + const promptFile = path.join(workspaceRoot, PROMPT_FILENAME); + await writeFile(promptFile, promptContent, 'utf8'); + + const args = this.buildCopilotArgs(promptContent); + const cwd = this.resolveCwd(workspaceRoot); + + const result = await this.executeCopilot(args, cwd, promptContent, request.signal, logger); + + if (result.timedOut) { + throw new Error( + `Copilot CLI timed out${formatTimeoutSuffix(this.config.timeoutMs ?? undefined)}`, + ); + } + + if (result.exitCode !== 0) { + const detail = pickDetail(result.stderr, result.stdout); + const prefix = `Copilot CLI exited with code ${result.exitCode}`; + throw new Error(detail ? `${prefix}: ${detail}` : prefix); + } + + const assistantText = extractCopilotResponse(result.stdout); + + return { + raw: { + stdout: result.stdout, + stderr: result.stderr, + exitCode: result.exitCode, + args, + executable: this.resolvedExecutable ?? this.config.executable, + promptFile, + workspace: workspaceRoot, + inputFiles, + logFile: logger?.filePath, + }, + outputMessages: [{ role: 'assistant' as const, content: assistantText }], + }; + } finally { + await logger?.close(); + await this.cleanupWorkspace(workspaceRoot); + } + } + + private async ensureEnvironmentReady(): Promise { + if (!this.environmentCheck) { + this.environmentCheck = this.validateEnvironment(); + } + await this.environmentCheck; + } + + private async validateEnvironment(): Promise { + this.resolvedExecutable = await locateExecutable(this.config.executable); + } + + private resolveCwd(workspaceRoot: string): string { + if (!this.config.cwd) { + return workspaceRoot; + } + return path.resolve(this.config.cwd); + } + + private buildCopilotArgs(prompt: string): string[] { + const args: string[] = []; + + // Non-interactive prompt mode + args.push('-p'); + + // Silent mode - only output agent response + args.push('-s'); + + // Auto-approve all tool usage + args.push('--allow-all-tools'); + + // Disable color output + args.push('--no-color'); + + // Model selection + if (this.config.model) { + args.push('--model', this.config.model); + } + + // Custom args from config + if (this.config.args && this.config.args.length > 0) { + args.push(...this.config.args); + } + + // Prompt is passed as the final argument after -p + args.push(prompt); + + return args; + } + + private async executeCopilot( + args: readonly string[], + cwd: string, + promptContent: string, + signal: AbortSignal | undefined, + logger: CopilotCliStreamLogger | undefined, + ): Promise { + try { + return await this.runCopilot({ + executable: this.resolvedExecutable ?? this.config.executable, + args, + cwd, + prompt: promptContent, + timeoutMs: this.config.timeoutMs, + env: process.env, + signal, + onStdoutChunk: logger ? (chunk) => logger.handleStdoutChunk(chunk) : undefined, + onStderrChunk: logger ? (chunk) => logger.handleStderrChunk(chunk) : undefined, + }); + } catch (error) { + const err = error as NodeJS.ErrnoException; + if (err.code === 'ENOENT') { + throw new Error( + `Copilot executable '${this.config.executable}' was not found. Update the target settings.executable or add it to PATH.`, + ); + } + throw error; + } + } + + private async createWorkspace(): Promise { + return await mkdtemp(path.join(tmpdir(), WORKSPACE_PREFIX)); + } + + private async cleanupWorkspace(workspaceRoot: string): Promise { + try { + await rm(workspaceRoot, { recursive: true, force: true }); + } catch { + // Best-effort cleanup + } + } + + private resolveLogDirectory(): string | undefined { + const disabled = isCopilotLogStreamingDisabled(); + if (disabled) { + return undefined; + } + if (this.config.logDir) { + return path.resolve(this.config.logDir); + } + return path.join(process.cwd(), '.agentv', 'logs', 'copilot-cli'); + } + + private async createStreamLogger( + request: ProviderRequest, + ): Promise { + const logDir = this.resolveLogDirectory(); + if (!logDir) { + return undefined; + } + try { + await mkdir(logDir, { recursive: true }); + } catch (error) { + const message = error instanceof Error ? error.message : String(error); + console.warn(`Skipping Copilot CLI stream logging (could not create ${logDir}): ${message}`); + return undefined; + } + + const filePath = path.join(logDir, buildLogFilename(request, this.targetName)); + + try { + const logger = await CopilotCliStreamLogger.create({ + filePath, + targetName: this.targetName, + evalCaseId: request.evalCaseId, + attempt: request.attempt, + format: this.config.logFormat ?? 'summary', + }); + recordCopilotCliLogEntry({ + filePath, + targetName: this.targetName, + evalCaseId: request.evalCaseId, + attempt: request.attempt, + }); + return logger; + } catch (error) { + const message = error instanceof Error ? error.message : String(error); + console.warn(`Skipping Copilot CLI stream logging for ${filePath}: ${message}`); + return undefined; + } + } +} + +class CopilotCliStreamLogger { + readonly filePath: string; + private readonly stream: WriteStream; + private readonly startedAt = Date.now(); + private stdoutBuffer = ''; + private stderrBuffer = ''; + private readonly format: 'summary' | 'json'; + + private constructor(filePath: string, format: 'summary' | 'json') { + this.filePath = filePath; + this.format = format; + this.stream = createWriteStream(filePath, { flags: 'a' }); + } + + static async create(options: { + readonly filePath: string; + readonly targetName: string; + readonly evalCaseId?: string; + readonly attempt?: number; + readonly format: 'summary' | 'json'; + }): Promise { + const logger = new CopilotCliStreamLogger(options.filePath, options.format); + const header = [ + '# Copilot CLI stream log', + `# target: ${options.targetName}`, + options.evalCaseId ? `# eval: ${options.evalCaseId}` : undefined, + options.attempt !== undefined ? `# attempt: ${options.attempt + 1}` : undefined, + `# started: ${new Date().toISOString()}`, + '', + ].filter((line): line is string => Boolean(line)); + logger.writeLines(header); + return logger; + } + + handleStdoutChunk(chunk: string): void { + this.stdoutBuffer += chunk; + this.flushBuffer('stdout'); + } + + handleStderrChunk(chunk: string): void { + this.stderrBuffer += chunk; + this.flushBuffer('stderr'); + } + + async close(): Promise { + this.flushBuffer('stdout'); + this.flushBuffer('stderr'); + this.flushRemainder(); + await new Promise((resolve, reject) => { + this.stream.once('error', reject); + this.stream.end(() => resolve()); + }); + } + + private writeLines(lines: readonly string[]): void { + for (const line of lines) { + this.stream.write(`${line}\n`); + } + } + + private flushBuffer(source: 'stdout' | 'stderr'): void { + const buffer = source === 'stdout' ? this.stdoutBuffer : this.stderrBuffer; + const lines = buffer.split(/\r?\n/); + const remainder = lines.pop() ?? ''; + if (source === 'stdout') { + this.stdoutBuffer = remainder; + } else { + this.stderrBuffer = remainder; + } + for (const line of lines) { + const formatted = this.formatLine(line, source); + if (formatted) { + this.stream.write(formatted); + this.stream.write('\n'); + } + } + } + + private formatLine(rawLine: string, source: 'stdout' | 'stderr'): string | undefined { + const trimmed = rawLine.trim(); + if (trimmed.length === 0) { + return undefined; + } + const prefix = source === 'stderr' ? 'stderr: ' : ''; + return `[+${formatElapsed(this.startedAt)}] [${source}] ${prefix}${trimmed}`; + } + + private flushRemainder(): void { + const stdoutRemainder = this.stdoutBuffer.trim(); + if (stdoutRemainder.length > 0) { + const formatted = this.formatLine(stdoutRemainder, 'stdout'); + if (formatted) { + this.stream.write(formatted); + this.stream.write('\n'); + } + } + const stderrRemainder = this.stderrBuffer.trim(); + if (stderrRemainder.length > 0) { + const formatted = this.formatLine(stderrRemainder, 'stderr'); + if (formatted) { + this.stream.write(formatted); + this.stream.write('\n'); + } + } + this.stdoutBuffer = ''; + this.stderrBuffer = ''; + } +} + +function isCopilotLogStreamingDisabled(): boolean { + const envValue = process.env.AGENTV_COPILOT_STREAM_LOGS; + if (!envValue) { + return false; + } + const normalized = envValue.trim().toLowerCase(); + return normalized === 'false' || normalized === '0' || normalized === 'off'; +} + +function buildLogFilename(request: ProviderRequest, targetName: string): string { + const timestamp = new Date().toISOString().replace(/[:.]/g, '-'); + const evalId = sanitizeForFilename(request.evalCaseId ?? 'copilot'); + const attemptSuffix = request.attempt !== undefined ? `_attempt-${request.attempt + 1}` : ''; + const target = sanitizeForFilename(targetName); + return `${timestamp}_${target}_${evalId}${attemptSuffix}_${randomUUID().slice(0, 8)}.log`; +} + +function sanitizeForFilename(value: string): string { + const sanitized = value.replace(/[^A-Za-z0-9._-]+/g, '_'); + return sanitized.length > 0 ? sanitized : 'copilot'; +} + +function formatElapsed(startedAt: number): string { + const elapsedSeconds = Math.floor((Date.now() - startedAt) / 1000); + const hours = Math.floor(elapsedSeconds / 3600); + const minutes = Math.floor((elapsedSeconds % 3600) / 60); + const seconds = elapsedSeconds % 60; + if (hours > 0) { + return `${hours.toString().padStart(2, '0')}:${minutes.toString().padStart(2, '0')}:${seconds.toString().padStart(2, '0')}`; + } + return `${minutes.toString().padStart(2, '0')}:${seconds.toString().padStart(2, '0')}`; +} + +function stripAnsiEscapes(text: string): string { + return text.replace(/\x1B\[[0-9;]*[A-Za-z]/g, '').replace(/\x1B\][^\x07]*\x07/g, ''); +} + +function extractCopilotResponse(stdout: string): string { + const cleaned = stripAnsiEscapes(stdout).trim(); + if (cleaned.length === 0) { + throw new Error('Copilot CLI produced no output'); + } + return cleaned; +} + +function pickDetail(stderr: string, stdout: string): string | undefined { + const errorText = stderr.trim(); + if (errorText.length > 0) { + return errorText; + } + const stdoutText = stdout.trim(); + return stdoutText.length > 0 ? stdoutText : undefined; +} + +function formatTimeoutSuffix(timeoutMs: number | undefined): string { + if (!timeoutMs || timeoutMs <= 0) { + return ''; + } + const seconds = Math.ceil(timeoutMs / 1000); + return ` after ${seconds}s`; +} + +async function locateExecutable(candidate: string): Promise { + const includesPathSeparator = candidate.includes('/') || candidate.includes('\\'); + if (includesPathSeparator) { + const resolved = path.isAbsolute(candidate) ? candidate : path.resolve(candidate); + await access(resolved, constants.F_OK); + return resolved; + } + + const locator = process.platform === 'win32' ? 'where' : 'which'; + try { + const { stdout } = await execAsync(`${locator} ${candidate}`); + const lines = stdout + .split(/\r?\n/) + .map((line) => line.trim()) + .filter((line) => line.length > 0); + if (lines.length > 0 && lines[0]) { + await access(lines[0], constants.F_OK); + return lines[0]; + } + } catch { + // ignore and fall back to error below + } + + throw new Error(`Copilot executable '${candidate}' was not found on PATH`); +} + +function shouldShellExecute(executable: string): boolean { + if (process.platform !== 'win32') { + return false; + } + const lower = executable.toLowerCase(); + return lower.endsWith('.cmd') || lower.endsWith('.bat') || lower.endsWith('.ps1'); +} + +async function defaultCopilotCliRunner( + options: CopilotCliRunOptions, +): Promise { + return await new Promise((resolve, reject) => { + const child = spawn(options.executable, options.args, { + cwd: options.cwd, + env: options.env, + stdio: ['pipe', 'pipe', 'pipe'], + shell: shouldShellExecute(options.executable), + }); + + let stdout = ''; + let stderr = ''; + let timedOut = false; + + const onAbort = (): void => { + child.kill('SIGTERM'); + }; + + if (options.signal) { + if (options.signal.aborted) { + onAbort(); + } else { + options.signal.addEventListener('abort', onAbort, { once: true }); + } + } + + let timeoutHandle: NodeJS.Timeout | undefined; + if (options.timeoutMs && options.timeoutMs > 0) { + timeoutHandle = setTimeout(() => { + timedOut = true; + child.kill('SIGTERM'); + }, options.timeoutMs); + timeoutHandle.unref?.(); + } + + child.stdout.setEncoding('utf8'); + child.stdout.on('data', (chunk) => { + stdout += chunk; + options.onStdoutChunk?.(chunk); + }); + + child.stderr.setEncoding('utf8'); + child.stderr.on('data', (chunk) => { + stderr += chunk; + options.onStderrChunk?.(chunk); + }); + + // Close stdin - copilot reads prompt from args, not stdin + child.stdin.end(); + + const cleanup = (): void => { + if (timeoutHandle) { + clearTimeout(timeoutHandle); + } + if (options.signal) { + options.signal.removeEventListener('abort', onAbort); + } + }; + + child.on('error', (error) => { + cleanup(); + reject(error); + }); + + child.on('close', (code) => { + cleanup(); + resolve({ + stdout, + stderr, + exitCode: typeof code === 'number' ? code : -1, + timedOut, + }); + }); + }); +} From 3c9e02ee4358bdc46b17f0ce148ff7b268760de7 Mon Sep 17 00:00:00 2001 From: Christopher Tso Date: Sun, 1 Feb 2026 04:55:24 +0000 Subject: [PATCH 06/14] feat(core): register CopilotCliProvider in factory and exports Co-Authored-By: Claude Opus 4.5 --- packages/core/src/evaluation/providers/index.ts | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/packages/core/src/evaluation/providers/index.ts b/packages/core/src/evaluation/providers/index.ts index 1e21c946..8cdc0202 100644 --- a/packages/core/src/evaluation/providers/index.ts +++ b/packages/core/src/evaluation/providers/index.ts @@ -2,6 +2,7 @@ import { AnthropicProvider, AzureProvider, GeminiProvider } from './ai-sdk.js'; import { ClaudeCodeProvider } from './claude-code.js'; import { CliProvider } from './cli.js'; import { CodexProvider } from './codex.js'; +import { CopilotCliProvider } from './copilot-cli.js'; import { MockProvider } from './mock.js'; import { PiAgentSdkProvider } from './pi-agent-sdk.js'; import { PiCodingAgentProvider } from './pi-coding-agent.js'; @@ -25,6 +26,7 @@ export type { AzureResolvedConfig, ClaudeCodeResolvedConfig, CliResolvedConfig, + CopilotResolvedConfig, GeminiResolvedConfig, MockResolvedConfig, PiAgentSdkResolvedConfig, @@ -46,6 +48,10 @@ export { consumeClaudeCodeLogEntries, subscribeToClaudeCodeLogEntries, } from './claude-code-log-tracker.js'; +export { + consumeCopilotCliLogEntries, + subscribeToCopilotCliLogEntries, +} from './copilot-cli-log-tracker.js'; export function createProvider(target: ResolvedTarget): Provider { switch (target.kind) { @@ -59,6 +65,8 @@ export function createProvider(target: ResolvedTarget): Provider { return new CliProvider(target.name, target.config); case 'codex': return new CodexProvider(target.name, target.config); + case 'copilot-cli': + return new CopilotCliProvider(target.name, target.config); case 'pi-coding-agent': return new PiCodingAgentProvider(target.name, target.config); case 'pi-agent-sdk': From c40899bdee1101bfa0e7970df06718bbcacd5ab7 Mon Sep 17 00:00:00 2001 From: Christopher Tso Date: Sun, 1 Feb 2026 04:55:41 +0000 Subject: [PATCH 07/14] feat(core): add copilot-cli settings to target validator Co-Authored-By: Claude Opus 4.5 --- .../validation/targets-validator.ts | 27 +++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/packages/core/src/evaluation/validation/targets-validator.ts b/packages/core/src/evaluation/validation/targets-validator.ts index 4dea3fef..93d422d9 100644 --- a/packages/core/src/evaluation/validation/targets-validator.ts +++ b/packages/core/src/evaluation/validation/targets-validator.ts @@ -96,6 +96,29 @@ const CODEX_SETTINGS = new Set([ 'logOutputFormat', ]); +const COPILOT_SETTINGS = new Set([ + ...COMMON_SETTINGS, + 'executable', + 'command', + 'binary', + 'args', + 'arguments', + 'model', + 'cwd', + 'timeout_seconds', + 'timeoutSeconds', + 'log_dir', + 'logDir', + 'log_directory', + 'logDirectory', + 'log_format', + 'logFormat', + 'log_output_format', + 'logOutputFormat', + 'system_prompt', + 'systemPrompt', +]); + const VSCODE_SETTINGS = new Set([ ...COMMON_SETTINGS, 'workspace_template', @@ -136,6 +159,10 @@ function getKnownSettings(provider: string): Set | null { case 'codex': case 'codex-cli': return CODEX_SETTINGS; + case 'copilot-cli': + case 'copilot': + case 'github-copilot': + return COPILOT_SETTINGS; case 'vscode': case 'vscode-insiders': return VSCODE_SETTINGS; From 62599ba154b39faffe395c9d5917bcb23c6f056b Mon Sep 17 00:00:00 2001 From: Christopher Tso Date: Sun, 1 Feb 2026 04:57:22 +0000 Subject: [PATCH 08/14] feat(cli): add copilot-cli log subscription to eval runner Co-Authored-By: Claude Opus 4.5 --- apps/cli/src/commands/eval/progress-display.ts | 7 +++++-- apps/cli/src/commands/eval/run-eval.ts | 14 ++++++++++++-- 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/apps/cli/src/commands/eval/progress-display.ts b/apps/cli/src/commands/eval/progress-display.ts index 0b18013c..54651c17 100644 --- a/apps/cli/src/commands/eval/progress-display.ts +++ b/apps/cli/src/commands/eval/progress-display.ts @@ -78,7 +78,7 @@ export class ProgressDisplay { } } - addLogPaths(paths: readonly string[], provider?: 'codex' | 'pi'): void { + addLogPaths(paths: readonly string[], provider?: 'codex' | 'pi' | 'copilot'): void { const newPaths: string[] = []; for (const path of paths) { if (this.logPathSet.has(path)) { @@ -96,7 +96,10 @@ export class ProgressDisplay { if (!this.hasPrintedLogHeader) { console.log(''); - const label = provider === 'pi' ? 'Pi Coding Agent' : 'Codex CLI'; + const label = + provider === 'pi' ? 'Pi Coding Agent' : + provider === 'copilot' ? 'Copilot CLI' : + 'Codex CLI'; console.log(`${label} logs:`); this.hasPrintedLogHeader = true; } diff --git a/apps/cli/src/commands/eval/run-eval.ts b/apps/cli/src/commands/eval/run-eval.ts index a153f6fb..afe47962 100644 --- a/apps/cli/src/commands/eval/run-eval.ts +++ b/apps/cli/src/commands/eval/run-eval.ts @@ -11,6 +11,7 @@ import { ensureVSCodeSubagents, loadEvalCases, subscribeToCodexLogEntries, + subscribeToCopilotCliLogEntries, subscribeToPiLogEntries, } from '@agentv/core'; @@ -152,7 +153,7 @@ type ProgressReporter = { setTotal(total: number): void; update(workerId: number, progress: WorkerProgress): void; finish(): void; - addLogPaths(paths: readonly string[], provider?: 'codex' | 'pi'): void; + addLogPaths(paths: readonly string[], provider?: 'codex' | 'pi' | 'copilot'): void; }; function createProgressReporter( @@ -167,7 +168,7 @@ function createProgressReporter( update: (workerId: number, progress: WorkerProgress) => display.updateWorker({ ...progress, workerId }), finish: () => display.finish(), - addLogPaths: (paths: readonly string[], provider?: 'codex' | 'pi') => + addLogPaths: (paths: readonly string[], provider?: 'codex' | 'pi' | 'copilot') => display.addLogPaths(paths, provider), }; } @@ -474,6 +475,14 @@ export async function runEvalCommand(input: RunEvalCommandInput): Promise seenPiLogPaths.add(entry.filePath); progressReporter.addLogPaths([entry.filePath], 'pi'); }); + const seenCopilotLogPaths = new Set(); + const unsubscribeCopilotLogs = subscribeToCopilotCliLogEntries((entry) => { + if (!entry.filePath || seenCopilotLogPaths.has(entry.filePath)) { + return; + } + seenCopilotLogPaths.add(entry.filePath); + progressReporter.addLogPaths([entry.filePath], 'copilot'); + }); for (const [testFilePath, meta] of fileMetadata.entries()) { for (const evalId of meta.evalIds) { const evalKey = makeEvalKey(testFilePath, evalId); @@ -525,6 +534,7 @@ export async function runEvalCommand(input: RunEvalCommandInput): Promise } finally { unsubscribeCodexLogs(); unsubscribePiLogs(); + unsubscribeCopilotLogs(); await outputWriter.close().catch(() => undefined); } } From 8bd81cf190a44bb7975cae3e0649fdee33a3b3ae Mon Sep 17 00:00:00 2001 From: Christopher Tso Date: Sun, 1 Feb 2026 04:57:36 +0000 Subject: [PATCH 09/14] docs: add copilot-cli target example Co-Authored-By: Claude Opus 4.5 --- examples/features/.agentv/targets.yaml | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/examples/features/.agentv/targets.yaml b/examples/features/.agentv/targets.yaml index e8a6bbc2..caba369a 100644 --- a/examples/features/.agentv/targets.yaml +++ b/examples/features/.agentv/targets.yaml @@ -68,6 +68,16 @@ targets: log_format: json # 'summary' (default) or 'json' for raw event logs # system_prompt: optional override (default instructs agent to include code in response) + # GitHub Copilot CLI + - name: copilot + provider: copilot-cli + judge_target: azure_base + # executable: copilot # Optional: defaults to `copilot` on PATH + # model: gpt-5 # Optional: override model + timeout_seconds: 180 + log_format: json # Optional: 'summary' (default) or 'json' + # system_prompt: optional override (default instructs agent to include code in response) + # Claude Code - Anthropic's autonomous coding CLI - name: claude-code provider: claude-code From 1656ceb47882b5ee084d24fbc52deac97c41d311 Mon Sep 17 00:00:00 2001 From: Christopher Tso Date: Sun, 1 Feb 2026 04:58:22 +0000 Subject: [PATCH 10/14] test(core): add CopilotCliProvider unit tests Co-Authored-By: Claude Opus 4.5 --- .../evaluation/providers/copilot-cli.test.ts | 277 ++++++++++++++++++ 1 file changed, 277 insertions(+) create mode 100644 packages/core/test/evaluation/providers/copilot-cli.test.ts diff --git a/packages/core/test/evaluation/providers/copilot-cli.test.ts b/packages/core/test/evaluation/providers/copilot-cli.test.ts new file mode 100644 index 00000000..a897285e --- /dev/null +++ b/packages/core/test/evaluation/providers/copilot-cli.test.ts @@ -0,0 +1,277 @@ +import { afterEach, beforeEach, describe, expect, it, mock } from 'bun:test'; +import { mkdir, mkdtemp, readFile, rm, writeFile } from 'node:fs/promises'; +import { tmpdir } from 'node:os'; +import path from 'node:path'; + +import { + type CopilotCliLogEntry, + consumeCopilotCliLogEntries, + subscribeToCopilotCliLogEntries, +} from '../../../src/evaluation/providers/copilot-cli-log-tracker.js'; +import { CopilotCliProvider } from '../../../src/evaluation/providers/copilot-cli.js'; +import { + type ProviderRequest, + extractLastAssistantContent, +} from '../../../src/evaluation/providers/types.js'; + +async function createTempDir(prefix: string): Promise { + return await mkdtemp(path.join(tmpdir(), prefix)); +} + +describe('CopilotCliProvider', () => { + let fixturesRoot: string; + + beforeEach(async () => { + fixturesRoot = await createTempDir('copilot-provider-'); + consumeCopilotCliLogEntries(); + }); + + afterEach(async () => { + await rm(fixturesRoot, { recursive: true, force: true }); + }); + + it('invokes copilot CLI and extracts plain text response', async () => { + const runner = mock( + async (_opts: { + prompt: string; + args: readonly string[]; + }) => ({ + stdout: 'Here is the solution:\n```python\nprint("hello")\n```', + stderr: '', + exitCode: 0, + }), + ); + const provider = new CopilotCliProvider( + 'copilot-target', + { + executable: process.execPath, + timeoutMs: 1000, + logDir: fixturesRoot, + }, + runner, + ); + + const request: ProviderRequest = { + question: 'Write a hello world program', + }; + + const response = await provider.invoke(request); + const content = extractLastAssistantContent(response.outputMessages); + expect(content).toContain('Here is the solution'); + expect(content).toContain('print("hello")'); + expect(runner).toHaveBeenCalledTimes(1); + + const invocation = runner.mock.calls[0][0]; + expect(invocation.args).toContain('-p'); + expect(invocation.args).toContain('-s'); + expect(invocation.args).toContain('--allow-all-tools'); + expect(invocation.args).toContain('--no-color'); + }); + + it('strips ANSI escape codes from output', async () => { + const runner = mock(async () => ({ + stdout: '\x1B[32mGreen text\x1B[0m and \x1B[1mbold\x1B[0m', + stderr: '', + exitCode: 0, + })); + const provider = new CopilotCliProvider( + 'copilot-target', + { + executable: process.execPath, + logDir: fixturesRoot, + }, + runner, + ); + + const response = await provider.invoke({ question: 'Test' }); + const content = extractLastAssistantContent(response.outputMessages); + expect(content).toBe('Green text and bold'); + expect(content).not.toContain('\x1B'); + }); + + it('fails on non-zero exit code', async () => { + const runner = mock(async () => ({ + stdout: '', + stderr: 'Error: authentication failed', + exitCode: 1, + })); + const provider = new CopilotCliProvider( + 'copilot-target', + { + executable: process.execPath, + logDir: fixturesRoot, + }, + runner, + ); + + await expect(provider.invoke({ question: 'Fail' })).rejects.toThrow( + /exited with code 1.*authentication failed/, + ); + }); + + it('reports timeout when process times out', async () => { + const runner = mock(async () => ({ + stdout: '', + stderr: '', + exitCode: -1, + timedOut: true, + })); + const provider = new CopilotCliProvider( + 'copilot-target', + { + executable: process.execPath, + timeoutMs: 5000, + logDir: fixturesRoot, + }, + runner, + ); + + await expect(provider.invoke({ question: 'Timeout' })).rejects.toThrow( + /timed out.*after 5s/, + ); + }); + + it('fails when copilot produces empty output', async () => { + const runner = mock(async () => ({ + stdout: '', + stderr: '', + exitCode: 0, + })); + const provider = new CopilotCliProvider( + 'copilot-target', + { + executable: process.execPath, + logDir: fixturesRoot, + }, + runner, + ); + + await expect(provider.invoke({ question: 'Empty' })).rejects.toThrow(/no output/i); + }); + + it('passes --model flag when model is configured', async () => { + const runner = mock(async () => ({ + stdout: 'response text', + stderr: '', + exitCode: 0, + })); + const provider = new CopilotCliProvider( + 'copilot-target', + { + executable: process.execPath, + model: 'gpt-5-mini', + logDir: fixturesRoot, + }, + runner, + ); + + await provider.invoke({ question: 'Model test' }); + const invocation = runner.mock.calls[0][0]; + expect(invocation.args).toContain('--model'); + expect(invocation.args).toContain('gpt-5-mini'); + }); + + it('appends custom args from config', async () => { + const runner = mock(async () => ({ + stdout: 'response text', + stderr: '', + exitCode: 0, + })); + const provider = new CopilotCliProvider( + 'copilot-target', + { + executable: process.execPath, + args: ['--custom-flag', 'value'], + logDir: fixturesRoot, + }, + runner, + ); + + await provider.invoke({ question: 'Custom args' }); + const invocation = runner.mock.calls[0][0]; + expect(invocation.args).toContain('--custom-flag'); + expect(invocation.args).toContain('value'); + }); + + it('includes preread block for input files', async () => { + const runner = mock(async () => ({ + stdout: 'done with files', + stderr: '', + exitCode: 0, + })); + const provider = new CopilotCliProvider( + 'copilot-target', + { + executable: process.execPath, + logDir: fixturesRoot, + }, + runner, + ); + + const guidelineFile = path.join(fixturesRoot, 'prompts', 'python.instructions.md'); + await mkdir(path.dirname(guidelineFile), { recursive: true }); + await writeFile(guidelineFile, 'guideline content', 'utf8'); + + const attachmentFile = path.join(fixturesRoot, 'src', 'main.py'); + await mkdir(path.dirname(attachmentFile), { recursive: true }); + await writeFile(attachmentFile, "print('hi')", 'utf8'); + + const request: ProviderRequest = { + question: 'Implement feature', + inputFiles: [guidelineFile, attachmentFile], + guideline_patterns: ['**/*.instructions.md'], + }; + + const response = await provider.invoke(request); + expect(extractLastAssistantContent(response.outputMessages)).toBe('done with files'); + + // The prompt passed as last arg should contain file references + const invocation = runner.mock.calls[0][0]; + const promptArg = invocation.args[invocation.args.length - 1]; + expect(promptArg).toContain('python.instructions.md'); + expect(promptArg).toContain('main.py'); + expect(promptArg).toContain('[[ ## user_query ## ]]'); + }); + + it('streams output to a log file and records log entry', async () => { + const runner = mock(async (options: { readonly onStdoutChunk?: (chunk: string) => void }) => { + options.onStdoutChunk?.('Processing request...\n'); + options.onStdoutChunk?.('Here is the answer'); + return { + stdout: 'Here is the answer', + stderr: '', + exitCode: 0, + }; + }); + + const provider = new CopilotCliProvider( + 'copilot-target', + { + executable: process.execPath, + logDir: fixturesRoot, + }, + runner, + ); + + const observedEntries: CopilotCliLogEntry[] = []; + const unsubscribe = subscribeToCopilotCliLogEntries((entry) => { + observedEntries.push(entry); + }); + + try { + const response = await provider.invoke({ question: 'log it', evalCaseId: 'case-123' }); + const raw = response.raw as Record; + expect(typeof raw.logFile).toBe('string'); + const logFile = raw.logFile as string; + const logContent = await readFile(logFile, 'utf8'); + expect(logContent).toContain('Processing request'); + expect(logContent).toContain('Here is the answer'); + + const tracked = consumeCopilotCliLogEntries(); + expect(tracked.some((entry) => entry.filePath === logFile)).toBe(true); + expect(observedEntries.some((entry) => entry.filePath === logFile)).toBe(true); + } finally { + unsubscribe(); + } + }); +}); From a834b1f10222c3c2889cb4971d23321d7717e264 Mon Sep 17 00:00:00 2001 From: Christopher Tso Date: Sun, 1 Feb 2026 04:59:41 +0000 Subject: [PATCH 11/14] style: fix lint and formatting issues Co-Authored-By: Claude Opus 4.5 --- .claude/plans/2026-02-01-copilot-provider.md | 761 ++++++++++++++++++ .../cli/src/commands/eval/progress-display.ts | 8 +- .../providers/copilot-cli-log-tracker.ts | 4 +- .../src/evaluation/providers/copilot-cli.ts | 7 +- .../evaluation/providers/copilot-cli.test.ts | 4 +- 5 files changed, 774 insertions(+), 10 deletions(-) create mode 100644 .claude/plans/2026-02-01-copilot-provider.md diff --git a/.claude/plans/2026-02-01-copilot-provider.md b/.claude/plans/2026-02-01-copilot-provider.md new file mode 100644 index 00000000..901c5032 --- /dev/null +++ b/.claude/plans/2026-02-01-copilot-provider.md @@ -0,0 +1,761 @@ +# Copilot CLI Provider Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. + +**Goal:** Add a built-in `copilot-cli` provider that invokes GitHub Copilot CLI as an external process for agent evaluations. + +**Architecture:** Mirrors the codex/claude-code provider pattern — spawns copilot CLI with `-p -s --allow-all-tools --no-color`, captures stdout as plain text response. Uses the same preread prompt construction, workspace management, and stream logging patterns. + +**Tech Stack:** TypeScript, Bun, Vitest, Zod + +**Working directory:** `/home/christso/projects/agentv_feat-copilot-provider` (git worktree, branch: `feat/add-copilot-cli-provider-openspec`) + +**Setup:** A git worktree has already been created at the working directory above, on branch `feat/add-copilot-cli-provider-openspec`, rebased onto `main`. All implementation work should happen in this worktree. Do NOT cd to the main repo at `/home/christso/projects/agentv`. + +--- + +## Task 1: Register copilot-cli in type system + +**Files:** +- Modify: `packages/core/src/evaluation/providers/types.ts` + +**Step 1: Add copilot-cli to ProviderKind union** + +In `types.ts`, add `'copilot-cli'` to the `ProviderKind` type union (after `'codex'`): + +```typescript +export type ProviderKind = + | 'azure' + | 'anthropic' + | 'gemini' + | 'codex' + | 'copilot-cli' // <-- ADD + | 'pi-coding-agent' + // ...rest unchanged +``` + +**Step 2: Add to AGENT_PROVIDER_KINDS** + +Add `'copilot-cli'` to the `AGENT_PROVIDER_KINDS` array (it has filesystem access): + +```typescript +export const AGENT_PROVIDER_KINDS: readonly ProviderKind[] = [ + 'codex', + 'copilot-cli', // <-- ADD + 'pi-coding-agent', + // ...rest +] as const; +``` + +**Step 3: Add to KNOWN_PROVIDERS** + +Add `'copilot-cli'` to the `KNOWN_PROVIDERS` array: + +```typescript +export const KNOWN_PROVIDERS: readonly ProviderKind[] = [ + // ...existing + 'codex', + 'copilot-cli', // <-- ADD + // ...rest +] as const; +``` + +**Step 4: Add aliases** + +Add `'copilot'` and `'github-copilot'` to `PROVIDER_ALIASES`: + +```typescript +export const PROVIDER_ALIASES: readonly string[] = [ + // ...existing + 'copilot', // alias for "copilot-cli" + 'github-copilot', // alias for "copilot-cli" +] as const; +``` + +**Step 5: Add system_prompt to TargetDefinition** + +Ensure `system_prompt` and `systemPrompt` exist in `TargetDefinition` (check if already present for codex/claude-code): + +```typescript +readonly system_prompt?: string | unknown | undefined; +readonly systemPrompt?: string | unknown | undefined; +``` + +**Step 6: Run typecheck** + +Run: `bun run typecheck` + +Expected: Compilation error in `providers/index.ts` because `createProvider` switch is not exhaustive (missing `'copilot-cli'` case). This is expected — we'll fix it in Task 4. + +**Step 7: Commit** + +```bash +git add packages/core/src/evaluation/providers/types.ts +git commit -m "feat(core): register copilot-cli provider kind and aliases" +``` + +--- + +## Task 2: Add CopilotResolvedConfig and target resolution + +**Files:** +- Modify: `packages/core/src/evaluation/providers/targets.ts` + +**Step 1: Add CopilotResolvedConfig interface** + +After `CodexResolvedConfig`, add: + +```typescript +export interface CopilotResolvedConfig { + readonly executable: string; + readonly model?: string; + readonly args?: readonly string[]; + readonly cwd?: string; + readonly timeoutMs?: number; + readonly logDir?: string; + readonly logFormat?: 'summary' | 'json'; + readonly systemPrompt?: string; +} +``` + +**Step 2: Add copilot-cli to ResolvedTarget union** + +Add a new union member to `ResolvedTarget`: + +```typescript +| { + readonly kind: 'copilot-cli'; + readonly name: string; + readonly judgeTarget?: string; + readonly workers?: number; + readonly providerBatching?: boolean; + readonly config: CopilotResolvedConfig; + } +``` + +**Step 3: Add resolveCopilotConfig function** + +Follow the same pattern as `resolveCodexConfig`: + +```typescript +function resolveCopilotConfig( + target: z.infer, + env: EnvLookup, +): CopilotResolvedConfig { + const executableSource = target.executable ?? target.command ?? target.binary; + const modelSource = target.model; + const argsSource = target.args ?? target.arguments; + const cwdSource = target.cwd; + const timeoutSource = target.timeout_seconds ?? target.timeoutSeconds; + const logDirSource = + target.log_dir ?? target.logDir ?? target.log_directory ?? target.logDirectory; + const logFormatSource = + target.log_format ?? target.logFormat ?? target.log_output_format ?? target.logOutputFormat; + const systemPromptSource = target.system_prompt ?? target.systemPrompt; + + const executable = + resolveOptionalString(executableSource, env, `${target.name} copilot executable`, { + allowLiteral: true, + optionalEnv: true, + }) ?? 'copilot'; + + const model = resolveOptionalString(modelSource, env, `${target.name} copilot model`, { + allowLiteral: true, + optionalEnv: true, + }); + + const args = resolveOptionalStringArray(argsSource, env, `${target.name} copilot args`); + + const cwd = resolveOptionalString(cwdSource, env, `${target.name} copilot cwd`, { + allowLiteral: true, + optionalEnv: true, + }); + + const timeoutMs = resolveTimeoutMs(timeoutSource, `${target.name} copilot timeout`); + + const logDir = resolveOptionalString(logDirSource, env, `${target.name} copilot log directory`, { + allowLiteral: true, + optionalEnv: true, + }); + + const logFormat = normalizeCopilotLogFormat(logFormatSource); + + const systemPrompt = + typeof systemPromptSource === 'string' && systemPromptSource.trim().length > 0 + ? systemPromptSource.trim() + : undefined; + + return { executable, model, args, cwd, timeoutMs, logDir, logFormat, systemPrompt }; +} + +function normalizeCopilotLogFormat(value: unknown): 'summary' | 'json' | undefined { + if (value === undefined || value === null) return undefined; + if (typeof value !== 'string') throw new Error("copilot log format must be 'summary' or 'json'"); + const normalized = value.trim().toLowerCase(); + if (normalized === 'json' || normalized === 'summary') return normalized; + throw new Error("copilot log format must be 'summary' or 'json'"); +} +``` + +**Step 4: Add cases in resolveTargetDefinition switch** + +In the `switch (provider)` block, add before the `default:` case: + +```typescript +case 'copilot-cli': +case 'copilot': +case 'github-copilot': + return { + kind: 'copilot-cli', + name: parsed.name, + judgeTarget: parsed.judge_target, + workers: parsed.workers, + providerBatching, + config: resolveCopilotConfig(parsed, env), + }; +``` + +**Step 5: Run typecheck** + +Run: `bun run typecheck` + +Expected: Still an error in `index.ts` for non-exhaustive switch (expected, fixing in Task 4). + +**Step 6: Commit** + +```bash +git add packages/core/src/evaluation/providers/targets.ts +git commit -m "feat(core): add copilot-cli target resolution and config" +``` + +--- + +## Task 3: Create copilot-cli-log-tracker.ts + +**Files:** +- Create: `packages/core/src/evaluation/providers/copilot-cli-log-tracker.ts` + +**Step 1: Create the log tracker** + +Copy the pattern from `codex-log-tracker.ts`, replacing all `codex` references with `copilotCli`: + +```typescript +export type CopilotCliLogEntry = { + readonly filePath: string; + readonly evalCaseId?: string; + readonly targetName: string; + readonly attempt?: number; +}; + +const GLOBAL_LOGS_KEY = Symbol.for('agentv.copilotCliLogs'); +const GLOBAL_SUBSCRIBERS_KEY = Symbol.for('agentv.copilotCliLogSubscribers'); + +type CopilotCliLogListener = (entry: CopilotCliLogEntry) => void; + +type GlobalWithCopilotCliLogs = typeof globalThis & { + [GLOBAL_LOGS_KEY]?: CopilotCliLogEntry[]; + [GLOBAL_SUBSCRIBERS_KEY]?: Set; +}; + +function getCopilotCliLogStore(): CopilotCliLogEntry[] { + const globalObject = globalThis as GlobalWithCopilotCliLogs; + const existing = globalObject[GLOBAL_LOGS_KEY]; + if (existing) return existing; + const created: CopilotCliLogEntry[] = []; + globalObject[GLOBAL_LOGS_KEY] = created; + return created; +} + +function getSubscriberStore(): Set { + const globalObject = globalThis as GlobalWithCopilotCliLogs; + const existing = globalObject[GLOBAL_SUBSCRIBERS_KEY]; + if (existing) return existing; + const created = new Set(); + globalObject[GLOBAL_SUBSCRIBERS_KEY] = created; + return created; +} + +function notifySubscribers(entry: CopilotCliLogEntry): void { + const subscribers = Array.from(getSubscriberStore()); + for (const listener of subscribers) { + try { listener(entry); } + catch (error) { + const message = error instanceof Error ? error.message : String(error); + console.warn(`Copilot CLI log subscriber failed: ${message}`); + } + } +} + +export function recordCopilotCliLogEntry(entry: CopilotCliLogEntry): void { + getCopilotCliLogStore().push(entry); + notifySubscribers(entry); +} + +export function consumeCopilotCliLogEntries(): CopilotCliLogEntry[] { + const store = getCopilotCliLogStore(); + if (store.length === 0) return []; + return store.splice(0, store.length); +} + +export function subscribeToCopilotCliLogEntries(listener: CopilotCliLogListener): () => void { + const store = getSubscriberStore(); + store.add(listener); + return () => { store.delete(listener); }; +} +``` + +**Step 2: Commit** + +```bash +git add packages/core/src/evaluation/providers/copilot-cli-log-tracker.ts +git commit -m "feat(core): add copilot-cli log tracker" +``` + +--- + +## Task 4: Implement CopilotCliProvider + +**Files:** +- Create: `packages/core/src/evaluation/providers/copilot-cli.ts` + +**Step 1: Create the provider** + +Model after `codex.ts` but simpler — copilot outputs plain text (not JSON). Key differences: +- Uses `-p -s --allow-all-tools --no-color` flags +- Optionally adds `--model ` if configured +- Response parsing: strip ANSI escape sequences, trim whitespace, use as candidate answer +- Uses `buildPromptDocument` from `preread.ts` for prompt construction + +```typescript +import { exec as execCallback, spawn } from 'node:child_process'; +import { randomUUID } from 'node:crypto'; +import { constants, createWriteStream } from 'node:fs'; +import type { WriteStream } from 'node:fs'; +import { access, mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises'; +import { tmpdir } from 'node:os'; +import path from 'node:path'; +import { promisify } from 'node:util'; + +import { recordCopilotCliLogEntry } from './copilot-cli-log-tracker.js'; +import { buildPromptDocument, normalizeInputFiles } from './preread.js'; +import type { CopilotResolvedConfig } from './targets.js'; +import type { Provider, ProviderRequest, ProviderResponse } from './types.js'; + +const execAsync = promisify(execCallback); +const WORKSPACE_PREFIX = 'agentv-copilot-'; +const PROMPT_FILENAME = 'prompt.md'; + +const DEFAULT_SYSTEM_PROMPT = `**IMPORTANT**: Follow these instructions for your response: +- Do NOT create any additional output files in the workspace. +- All intended file outputs/changes MUST be written in your response. +- For each intended file, include the relative path and unified git diff following the convention \`diff --git ...\`. +This is required for evaluation scoring.`; + +// ... (full implementation following codex.ts patterns with:) +// - CopilotCliRunOptions, CopilotCliRunResult, CopilotCliRunner interfaces +// - CopilotCliProvider class +// - CopilotCliStreamLogger class +// - stripAnsiEscapes function +// - locateExecutable (reused from codex or extracted to shared util) +// - defaultCopilotCliRunner using spawn +``` + +Key implementation details for `buildCopilotArgs()`: + +```typescript +private buildCopilotArgs(): string[] { + const args: string[] = [ + '-p', // non-interactive prompt mode + // prompt will be passed via stdin or as argument + ]; + + // Silent mode - only output agent response + args.push('-s'); + + // Auto-approve all tool usage + args.push('--allow-all-tools'); + + // Disable color output + args.push('--no-color'); + + // Model selection + if (this.config.model) { + args.push('--model', this.config.model); + } + + // Custom args from config + if (this.config.args && this.config.args.length > 0) { + args.push(...this.config.args); + } + + return args; +} +``` + +For prompt delivery, copilot uses `-p ` where the prompt text is a positional argument after `-p`. The prompt should be passed as the last argument. + +For response parsing — copilot with `-s` outputs only the agent response text. Strip ANSI and trim: + +```typescript +function stripAnsiEscapes(text: string): string { + return text.replace(/\x1B\[[0-9;]*[A-Za-z]/g, '').replace(/\x1B\][^\x07]*\x07/g, ''); +} + +function extractCopilotResponse(stdout: string): string { + const cleaned = stripAnsiEscapes(stdout).trim(); + if (cleaned.length === 0) { + throw new Error('Copilot CLI produced no output'); + } + return cleaned; +} +``` + +**Step 2: Run typecheck** (will still fail until index.ts is updated) + +**Step 3: Commit** + +```bash +git add packages/core/src/evaluation/providers/copilot-cli.ts +git commit -m "feat(core): implement CopilotCliProvider" +``` + +--- + +## Task 5: Register provider in factory and exports + +**Files:** +- Modify: `packages/core/src/evaluation/providers/index.ts` + +**Step 1: Add imports** + +```typescript +import { CopilotCliProvider } from './copilot-cli.js'; +``` + +**Step 2: Add case to createProvider switch** + +After the `case 'codex':` block, add: + +```typescript +case 'copilot-cli': + return new CopilotCliProvider(target.name, target.config); +``` + +**Step 3: Add exports** + +Add type export for `CopilotResolvedConfig`: + +```typescript +export type { + // ...existing + CopilotResolvedConfig, +} from './targets.js'; +``` + +Add log tracker exports: + +```typescript +export { + consumeCopilotCliLogEntries, + subscribeToCopilotCliLogEntries, +} from './copilot-cli-log-tracker.js'; +``` + +**Step 4: Run typecheck** + +Run: `bun run typecheck` + +Expected: PASS (exhaustive switch now covers `copilot-cli`) + +**Step 5: Commit** + +```bash +git add packages/core/src/evaluation/providers/index.ts +git commit -m "feat(core): register CopilotCliProvider in factory and exports" +``` + +--- + +## Task 6: Add target validation settings + +**Files:** +- Modify: `packages/core/src/evaluation/validation/targets-validator.ts` + +**Step 1: Add COPILOT_SETTINGS constant** + +After `CODEX_SETTINGS`, add: + +```typescript +const COPILOT_SETTINGS = new Set([ + ...COMMON_SETTINGS, + 'executable', + 'command', + 'binary', + 'args', + 'arguments', + 'model', + 'cwd', + 'timeout_seconds', + 'timeoutSeconds', + 'log_dir', + 'logDir', + 'log_directory', + 'logDirectory', + 'log_format', + 'logFormat', + 'log_output_format', + 'logOutputFormat', + 'system_prompt', + 'systemPrompt', +]); +``` + +**Step 2: Add cases in getKnownSettings** + +```typescript +case 'copilot-cli': +case 'copilot': +case 'github-copilot': + return COPILOT_SETTINGS; +``` + +**Step 3: Run typecheck and tests** + +Run: `bun run typecheck && bun test` + +Expected: PASS + +**Step 4: Commit** + +```bash +git add packages/core/src/evaluation/validation/targets-validator.ts +git commit -m "feat(core): add copilot-cli settings to target validator" +``` + +--- + +## Task 7: Add CLI log subscription integration + +**Files:** +- Modify: `apps/cli/src/commands/eval/run-eval.ts` +- Modify: `apps/cli/src/commands/eval/progress-display.ts` + +**Step 1: Add import in run-eval.ts** + +Add `subscribeToCopilotCliLogEntries` to the import from `@agentv/core`: + +```typescript +import { + // ...existing + subscribeToCopilotCliLogEntries, +} from '@agentv/core'; +``` + +**Step 2: Add subscription in run-eval.ts** + +After the Pi log subscription block (~line 476), add: + +```typescript +const seenCopilotLogPaths = new Set(); +const unsubscribeCopilotLogs = subscribeToCopilotCliLogEntries((entry) => { + if (!entry.filePath || seenCopilotLogPaths.has(entry.filePath)) { + return; + } + seenCopilotLogPaths.add(entry.filePath); + progressReporter.addLogPaths([entry.filePath], 'copilot'); +}); +``` + +**Step 3: Add cleanup in run-eval.ts finally block** + +After `unsubscribePiLogs();`, add: + +```typescript +unsubscribeCopilotLogs(); +``` + +**Step 4: Update progress-display.ts** + +Update the `addLogPaths` method signature and label logic: + +```typescript +addLogPaths(paths: readonly string[], provider?: 'codex' | 'pi' | 'copilot'): void { + // ...existing dedup logic... + + if (!this.hasPrintedLogHeader) { + console.log(''); + const label = + provider === 'pi' ? 'Pi Coding Agent' : + provider === 'copilot' ? 'Copilot CLI' : + 'Codex CLI'; + console.log(`${label} logs:`); + this.hasPrintedLogHeader = true; + } + // ...rest unchanged +} +``` + +**Step 5: Run build and typecheck** + +Run: `bun run build && bun run typecheck` + +Expected: PASS + +**Step 6: Commit** + +```bash +git add apps/cli/src/commands/eval/run-eval.ts apps/cli/src/commands/eval/progress-display.ts +git commit -m "feat(cli): add copilot-cli log subscription to eval runner" +``` + +--- + +## Task 8: Add example target configuration + +**Files:** +- Modify: `examples/features/.agentv/targets.yaml` + +**Step 1: Add copilot target entry** + +After the codex entry, add: + +```yaml + # GitHub Copilot CLI + - name: copilot + provider: copilot-cli + judge_target: azure_base + # executable: copilot # Optional: defaults to `copilot` on PATH + # model: gpt-5 # Optional: override model + timeout_seconds: 180 + log_format: json # Optional: 'summary' (default) or 'json' + # system_prompt: optional override (default instructs agent to include code in response) +``` + +**Step 2: Commit** + +```bash +git add examples/features/.agentv/targets.yaml +git commit -m "docs: add copilot-cli target example" +``` + +--- + +## Task 9: Write unit tests + +**Files:** +- Create: `packages/core/test/evaluation/providers/copilot-cli.test.ts` + +**Step 1: Write tests** + +Follow the pattern from `codex.test.ts`. Key test cases: + +1. **Basic invocation** — mock runner returns plain text, verify response extraction and args +2. **ANSI stripping** — mock runner returns text with ANSI escape codes, verify they're stripped +3. **Non-zero exit code** — verify error includes exit code and stderr +4. **Timeout handling** — mock runner sets `timedOut: true`, verify error message +5. **Empty output** — verify proper error +6. **Stream logging** — verify log file is created and contains output +7. **Model arg** — verify `--model` is passed when configured +8. **Custom args** — verify user args are appended +9. **Input files preread** — verify preread block is included in prompt + +**Step 2: Run tests** + +Run: `bun test packages/core/test/evaluation/providers/copilot-cli.test.ts` + +Expected: All PASS + +**Step 3: Commit** + +```bash +git add packages/core/test/evaluation/providers/copilot-cli.test.ts +git commit -m "test(core): add CopilotCliProvider unit tests" +``` + +--- + +## Task 10: Full build + lint validation + +**Step 1: Run full validation** + +Run: `bun run build && bun run typecheck && bun run lint && bun test` + +Expected: All PASS + +**Step 2: Fix any lint issues** + +If biome reports issues, fix them. + +**Step 3: Commit any fixes** + +--- + +## Task 11: E2E validation with real copilot CLI + +**Step 1: Create a minimal test eval** + +Use an existing basic example or create a temporary one. Run with the copilot target: + +```bash +bun agentv eval examples/features/rubric/evals/dataset.yaml --target copilot --eval-id +``` + +**Step 2: Inspect results JSONL** + +Verify: +- The copilot-cli provider is invoked (check `provider_kind` or target name in output) +- The response contains actual copilot output (not empty) +- Log files are created in `.agentv/logs/copilot-cli/` + +**Step 3: Fix any issues discovered during e2e testing** + +--- + +## Task 12: Create PR and clean up worktree + +**Step 1: Push branch and create PR** + +```bash +git push -u origin feat/add-copilot-cli-provider-openspec +gh pr create --title "feat(core): add copilot-cli provider" --body "$(cat <<'EOF' +## Summary +- Adds `copilot-cli` built-in provider kind (aliases: `copilot`, `github-copilot`) +- Invokes GitHub Copilot CLI via `-p -s --allow-all-tools --no-color` +- Follows same patterns as codex/claude-code providers (workspace, preread, stream logging) +- Includes target validation, CLI log subscription, example config, and unit tests + +Closes #127 + +## Test plan +- [x] Unit tests for provider invocation, ANSI stripping, timeout, error handling +- [x] `bun run build && bun run typecheck && bun run lint && bun test` pass +- [x] E2E validation with real copilot CLI on basic eval example + +🤖 Generated with [Claude Code](https://claude.com/claude-code) +EOF +)" +``` + +**Step 2: Remove worktree** + +From the main repo directory: + +```bash +cd /home/christso/projects/agentv +git worktree remove ../agentv_feat-copilot-provider +``` + +**Step 3: Delete the plan file** + +Per CLAUDE.md: "Once development concludes, delete the plan file." + +The plan file at `.claude/plans/2026-02-01-copilot-provider.md` was in the worktree and is now gone with the worktree removal. + +--- + +## Notes + +- The copilot CLI executable is at `/home/christso/.local/bin/copilot` (version 0.0.400) +- Key flags: `-p ` (non-interactive), `-s` (silent/response only), `--allow-all-tools`, `--no-color` +- Unlike codex, copilot outputs plain text (not JSON/JSONL) when using `-s` +- The `locateExecutable` function from codex.ts should be extracted to a shared utility or duplicated diff --git a/apps/cli/src/commands/eval/progress-display.ts b/apps/cli/src/commands/eval/progress-display.ts index 54651c17..d0bb8525 100644 --- a/apps/cli/src/commands/eval/progress-display.ts +++ b/apps/cli/src/commands/eval/progress-display.ts @@ -97,9 +97,11 @@ export class ProgressDisplay { if (!this.hasPrintedLogHeader) { console.log(''); const label = - provider === 'pi' ? 'Pi Coding Agent' : - provider === 'copilot' ? 'Copilot CLI' : - 'Codex CLI'; + provider === 'pi' + ? 'Pi Coding Agent' + : provider === 'copilot' + ? 'Copilot CLI' + : 'Codex CLI'; console.log(`${label} logs:`); this.hasPrintedLogHeader = true; } diff --git a/packages/core/src/evaluation/providers/copilot-cli-log-tracker.ts b/packages/core/src/evaluation/providers/copilot-cli-log-tracker.ts index 3c92d97d..b2c9b52c 100644 --- a/packages/core/src/evaluation/providers/copilot-cli-log-tracker.ts +++ b/packages/core/src/evaluation/providers/copilot-cli-log-tracker.ts @@ -62,9 +62,7 @@ export function consumeCopilotCliLogEntries(): CopilotCliLogEntry[] { return store.splice(0, store.length); } -export function subscribeToCopilotCliLogEntries( - listener: CopilotCliLogListener, -): () => void { +export function subscribeToCopilotCliLogEntries(listener: CopilotCliLogListener): () => void { const store = getSubscriberStore(); store.add(listener); return () => { diff --git a/packages/core/src/evaluation/providers/copilot-cli.ts b/packages/core/src/evaluation/providers/copilot-cli.ts index fca36975..07a74c14 100644 --- a/packages/core/src/evaluation/providers/copilot-cli.ts +++ b/packages/core/src/evaluation/providers/copilot-cli.ts @@ -410,8 +410,13 @@ function formatElapsed(startedAt: number): string { return `${minutes.toString().padStart(2, '0')}:${seconds.toString().padStart(2, '0')}`; } +// biome-ignore lint/suspicious/noControlCharactersInRegex: ANSI escape stripping requires matching control characters +const ANSI_ESCAPE_RE = /\x1B\[[0-9;]*[A-Za-z]/g; +// biome-ignore lint/suspicious/noControlCharactersInRegex: OSC sequence stripping requires matching control characters +const ANSI_OSC_RE = /\x1B\][^\x07]*\x07/g; + function stripAnsiEscapes(text: string): string { - return text.replace(/\x1B\[[0-9;]*[A-Za-z]/g, '').replace(/\x1B\][^\x07]*\x07/g, ''); + return text.replace(ANSI_ESCAPE_RE, '').replace(ANSI_OSC_RE, ''); } function extractCopilotResponse(stdout: string): string { diff --git a/packages/core/test/evaluation/providers/copilot-cli.test.ts b/packages/core/test/evaluation/providers/copilot-cli.test.ts index a897285e..02559da8 100644 --- a/packages/core/test/evaluation/providers/copilot-cli.test.ts +++ b/packages/core/test/evaluation/providers/copilot-cli.test.ts @@ -126,9 +126,7 @@ describe('CopilotCliProvider', () => { runner, ); - await expect(provider.invoke({ question: 'Timeout' })).rejects.toThrow( - /timed out.*after 5s/, - ); + await expect(provider.invoke({ question: 'Timeout' })).rejects.toThrow(/timed out.*after 5s/); }); it('fails when copilot produces empty output', async () => { From abbbf6cc219a75be7e37b97e92d46c55eca3c6da Mon Sep 17 00:00:00 2001 From: Christopher Tso Date: Sun, 1 Feb 2026 05:04:38 +0000 Subject: [PATCH 12/14] =?UTF-8?q?fix(core):=20fix=20copilot-cli=20arg=20or?= =?UTF-8?q?der=20=E2=80=94=20place=20-p=20=20last?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The -p flag takes the next argument as the prompt value. Other flags must come before -p to avoid "too many arguments" error. Co-Authored-By: Claude Opus 4.5 --- packages/core/src/evaluation/providers/copilot-cli.ts | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/packages/core/src/evaluation/providers/copilot-cli.ts b/packages/core/src/evaluation/providers/copilot-cli.ts index 07a74c14..9ec28875 100644 --- a/packages/core/src/evaluation/providers/copilot-cli.ts +++ b/packages/core/src/evaluation/providers/copilot-cli.ts @@ -147,9 +147,6 @@ export class CopilotCliProvider implements Provider { private buildCopilotArgs(prompt: string): string[] { const args: string[] = []; - // Non-interactive prompt mode - args.push('-p'); - // Silent mode - only output agent response args.push('-s'); @@ -169,8 +166,8 @@ export class CopilotCliProvider implements Provider { args.push(...this.config.args); } - // Prompt is passed as the final argument after -p - args.push(prompt); + // Non-interactive prompt mode: -p must be last (flag + value pair) + args.push('-p', prompt); return args; } From 06674cc60ae0fb16fca5e198fdb02b3f88710f70 Mon Sep 17 00:00:00 2001 From: Christopher Tso Date: Sun, 1 Feb 2026 05:19:35 +0000 Subject: [PATCH 13/14] refactor(core): remove copilot/github-copilot aliases to reduce maintenance Only keep copilot-cli as the canonical provider name. This follows the principle of keeping maintenance burden low since most other providers only have 0-1 aliases. Co-Authored-By: Claude Opus 4.5 --- packages/core/src/evaluation/providers/targets.ts | 2 -- packages/core/src/evaluation/providers/types.ts | 2 -- packages/core/src/evaluation/validation/targets-validator.ts | 2 -- 3 files changed, 6 deletions(-) diff --git a/packages/core/src/evaluation/providers/targets.ts b/packages/core/src/evaluation/providers/targets.ts index 2afd620b..9d9ade42 100644 --- a/packages/core/src/evaluation/providers/targets.ts +++ b/packages/core/src/evaluation/providers/targets.ts @@ -713,8 +713,6 @@ export function resolveTargetDefinition( config: resolveCodexConfig(parsed, env), }; case 'copilot-cli': - case 'copilot': - case 'github-copilot': return { kind: 'copilot-cli', name: parsed.name, diff --git a/packages/core/src/evaluation/providers/types.ts b/packages/core/src/evaluation/providers/types.ts index 0afec63e..51ec7c52 100644 --- a/packages/core/src/evaluation/providers/types.ts +++ b/packages/core/src/evaluation/providers/types.ts @@ -65,8 +65,6 @@ export const PROVIDER_ALIASES: readonly string[] = [ 'google', // alias for "gemini" 'google-gemini', // alias for "gemini" 'codex-cli', // alias for "codex" - 'copilot', // alias for "copilot-cli" - 'github-copilot', // alias for "copilot-cli" 'pi', // alias for "pi-coding-agent" 'openai', // legacy/future support 'bedrock', // legacy/future support diff --git a/packages/core/src/evaluation/validation/targets-validator.ts b/packages/core/src/evaluation/validation/targets-validator.ts index 93d422d9..866d5065 100644 --- a/packages/core/src/evaluation/validation/targets-validator.ts +++ b/packages/core/src/evaluation/validation/targets-validator.ts @@ -160,8 +160,6 @@ function getKnownSettings(provider: string): Set | null { case 'codex-cli': return CODEX_SETTINGS; case 'copilot-cli': - case 'copilot': - case 'github-copilot': return COPILOT_SETTINGS; case 'vscode': case 'vscode-insiders': From 45a0396b03afc8454cf754f6bec002715a2f7419 Mon Sep 17 00:00:00 2001 From: Christopher Tso Date: Sun, 1 Feb 2026 06:39:09 +0000 Subject: [PATCH 14/14] feat(core): copy input files into copilot-cli workspace Instead of referencing input files via file:// URIs pointing to absolute paths, copy them into the temp workspace and reference by relative path in the prompt. This allows Copilot CLI to read the files from its cwd. - Add copyInputFilesToWorkspace() with basename collision handling - Add buildCopilotFilePrereadBlock() for relative-path prompt sections - Replace buildPromptDocument with collectGuidelineFiles in invoke() - Include copiedFiles mappings in raw response for debugging - Update tests to assert no file:// URIs and add collision test Co-Authored-By: Claude Opus 4.5 --- .../src/evaluation/providers/copilot-cli.ts | 88 ++++++++++++++++++- .../evaluation/providers/copilot-cli.test.ts | 70 ++++++++++++++- 2 files changed, 152 insertions(+), 6 deletions(-) diff --git a/packages/core/src/evaluation/providers/copilot-cli.ts b/packages/core/src/evaluation/providers/copilot-cli.ts index 9ec28875..5e7d3193 100644 --- a/packages/core/src/evaluation/providers/copilot-cli.ts +++ b/packages/core/src/evaluation/providers/copilot-cli.ts @@ -2,13 +2,13 @@ import { exec as execCallback, spawn } from 'node:child_process'; import { randomUUID } from 'node:crypto'; import { constants, createWriteStream } from 'node:fs'; import type { WriteStream } from 'node:fs'; -import { access, mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises'; +import { access, copyFile, mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises'; import { tmpdir } from 'node:os'; import path from 'node:path'; import { promisify } from 'node:util'; import { recordCopilotCliLogEntry } from './copilot-cli-log-tracker.js'; -import { buildPromptDocument, normalizeInputFiles } from './preread.js'; +import { collectGuidelineFiles, normalizeInputFiles } from './preread.js'; import type { CopilotResolvedConfig } from './targets.js'; import type { Provider, ProviderRequest, ProviderResponse } from './types.js'; @@ -47,6 +47,67 @@ interface CopilotCliRunResult { type CopilotCliRunner = (options: CopilotCliRunOptions) => Promise; +interface CopiedFileMapping { + readonly originalPath: string; + readonly workspaceRelativePath: string; +} + +async function copyInputFilesToWorkspace( + workspaceRoot: string, + inputFiles: readonly string[], +): Promise { + const usedNames = new Map(); + const mappings: CopiedFileMapping[] = []; + + for (const originalPath of inputFiles) { + const ext = path.extname(originalPath); + const stem = path.basename(originalPath, ext); + let relativeName: string; + + const baseKey = `${stem}${ext}`; + const count = usedNames.get(baseKey) ?? 0; + if (count === 0) { + relativeName = baseKey; + } else { + relativeName = `${stem}_${count}${ext}`; + } + usedNames.set(baseKey, count + 1); + + const dest = path.join(workspaceRoot, relativeName); + await copyFile(originalPath, dest); + mappings.push({ originalPath, workspaceRelativePath: relativeName }); + } + + return mappings; +} + +function buildCopilotFilePrereadBlock( + guidelineMappings: readonly CopiedFileMapping[], + inputMappings: readonly CopiedFileMapping[], +): string { + if (guidelineMappings.length === 0 && inputMappings.length === 0) { + return ''; + } + + const buildList = (mappings: readonly CopiedFileMapping[]): string => + mappings.map((m) => `* ${m.workspaceRelativePath}`).join('\n'); + + const sections: string[] = []; + if (guidelineMappings.length > 0) { + sections.push(`Read all guideline files:\n${buildList(guidelineMappings)}.`); + } + if (inputMappings.length > 0) { + sections.push(`Read all input files:\n${buildList(inputMappings)}.`); + } + + sections.push( + 'If any file is missing, fail with ERROR: missing-file and stop.', + 'Then apply system_instructions on the user query below.', + ); + + return sections.join('\n'); +} + export class CopilotCliProvider implements Provider { readonly id: string; readonly kind = 'copilot-cli' as const; @@ -81,9 +142,27 @@ export class CopilotCliProvider implements Provider { const workspaceRoot = await this.createWorkspace(); const logger = await this.createStreamLogger(request).catch(() => undefined); try { - const basePrompt = buildPromptDocument(request, inputFiles); + // Copy input files into workspace and build prompt with relative paths + const copiedFiles = inputFiles + ? await copyInputFilesToWorkspace(workspaceRoot, inputFiles) + : []; + + const guidelineFileSet = new Set( + collectGuidelineFiles(inputFiles, request.guideline_patterns), + ); + const guidelineMappings = copiedFiles.filter((m) => guidelineFileSet.has(m.originalPath)); + const nonGuidelineMappings = copiedFiles.filter((m) => !guidelineFileSet.has(m.originalPath)); + + const prereadBlock = buildCopilotFilePrereadBlock(guidelineMappings, nonGuidelineMappings); const systemPrompt = this.config.systemPrompt ?? DEFAULT_SYSTEM_PROMPT; - const promptContent = `${systemPrompt}\n\n${basePrompt}`; + + const promptParts: string[] = [systemPrompt]; + if (prereadBlock.length > 0) { + promptParts.push('', prereadBlock); + } + promptParts.push('', '[[ ## user_query ## ]]', request.question.trim()); + + const promptContent = promptParts.join('\n'); const promptFile = path.join(workspaceRoot, PROMPT_FILENAME); await writeFile(promptFile, promptContent, 'utf8'); @@ -116,6 +195,7 @@ export class CopilotCliProvider implements Provider { promptFile, workspace: workspaceRoot, inputFiles, + copiedFiles, logFile: logger?.filePath, }, outputMessages: [{ role: 'assistant' as const, content: assistantText }], diff --git a/packages/core/test/evaluation/providers/copilot-cli.test.ts b/packages/core/test/evaluation/providers/copilot-cli.test.ts index 02559da8..dd5e3f64 100644 --- a/packages/core/test/evaluation/providers/copilot-cli.test.ts +++ b/packages/core/test/evaluation/providers/copilot-cli.test.ts @@ -191,7 +191,7 @@ describe('CopilotCliProvider', () => { expect(invocation.args).toContain('value'); }); - it('includes preread block for input files', async () => { + it('copies input files into workspace and uses relative paths in prompt', async () => { const runner = mock(async () => ({ stdout: 'done with files', stderr: '', @@ -223,12 +223,78 @@ describe('CopilotCliProvider', () => { const response = await provider.invoke(request); expect(extractLastAssistantContent(response.outputMessages)).toBe('done with files'); - // The prompt passed as last arg should contain file references + // The prompt passed as last arg should contain relative file references (no file:// URIs) const invocation = runner.mock.calls[0][0]; const promptArg = invocation.args[invocation.args.length - 1]; expect(promptArg).toContain('python.instructions.md'); expect(promptArg).toContain('main.py'); expect(promptArg).toContain('[[ ## user_query ## ]]'); + expect(promptArg).not.toContain('file://'); + + // Verify copiedFiles in raw response + const raw = response.raw as Record; + const copiedFiles = raw.copiedFiles as Array<{ + originalPath: string; + workspaceRelativePath: string; + }>; + expect(copiedFiles).toHaveLength(2); + expect(copiedFiles.some((f) => f.workspaceRelativePath === 'python.instructions.md')).toBe( + true, + ); + expect(copiedFiles.some((f) => f.workspaceRelativePath === 'main.py')).toBe(true); + + // Verify files were actually copied into the workspace + const workspace = raw.workspace as string; + const copiedGuideline = await readFile( + path.join(workspace, 'python.instructions.md'), + 'utf8', + ).catch(() => null); + // Workspace is cleaned up after invoke, so files may not exist + // The copiedFiles metadata confirms they were mapped correctly + }); + + it('handles basename collisions when copying input files', async () => { + const runner = mock(async () => ({ + stdout: 'done', + stderr: '', + exitCode: 0, + })); + const provider = new CopilotCliProvider( + 'copilot-target', + { + executable: process.execPath, + logDir: fixturesRoot, + }, + runner, + ); + + // Create two files with the same basename in different directories + const file1 = path.join(fixturesRoot, 'dir1', 'config.yaml'); + const file2 = path.join(fixturesRoot, 'dir2', 'config.yaml'); + await mkdir(path.join(fixturesRoot, 'dir1'), { recursive: true }); + await mkdir(path.join(fixturesRoot, 'dir2'), { recursive: true }); + await writeFile(file1, 'config1', 'utf8'); + await writeFile(file2, 'config2', 'utf8'); + + const response = await provider.invoke({ + question: 'Check configs', + inputFiles: [file1, file2], + }); + + const raw = response.raw as Record; + const copiedFiles = raw.copiedFiles as Array<{ + originalPath: string; + workspaceRelativePath: string; + }>; + expect(copiedFiles).toHaveLength(2); + expect(copiedFiles[0].workspaceRelativePath).toBe('config.yaml'); + expect(copiedFiles[1].workspaceRelativePath).toBe('config_1.yaml'); + + // Verify prompt contains both relative paths + const invocation = runner.mock.calls[0][0]; + const promptArg = invocation.args[invocation.args.length - 1]; + expect(promptArg).toContain('config.yaml'); + expect(promptArg).toContain('config_1.yaml'); }); it('streams output to a log file and records log entry', async () => {