feat(appkit): agents() plugin, createAgent(def), and markdown-driven agents by MarioCadenas · Pull Request #304 · databricks/appkit

MarioCadenas · 2026-04-21T17:58:43Z

The main product layer. Turns an AppKit app into an AI-agent host with
markdown-driven agent discovery, code-defined agents, sub-agents,
human-in-the-loop approval, zero-trust MCP, and a standalone
run-without-HTTP executor.

`createAgent(def)` — pure factory

packages/appkit/src/core/create-agent-def.ts. Returns the passed-in
definition after cycle-detecting the sub-agent graph. No adapter
construction, no side effects — safe at module top-level. The returned
AgentDefinition is plain data, consumable by either agents({ agents })
or runAgent(def, input).

`agents()` plugin

packages/appkit/src/plugins/agents/agents.ts. AgentsPlugin class:

Loads markdown agents from config/agents/*.md (configurable dir)
via real YAML frontmatter parsing (js-yaml). Frontmatter schema:
endpoint, model, toolkits, tools, agents, default,
maxSteps, maxTokens, baseSystemPrompt, ephemeral. Unknown
keys logged, invalid YAML throws at boot.
Merges code-defined agents passed via agents({ agents: { name: def } }).
Code wins on key collision.
For each agent, builds a per-agent tool index from:
1. Sub-agents (agents: frontmatter or agents: code field) —
  synthesized as agent-<key> tools on the parent. Markdown
  agents: [...] resolves against both markdown siblings and
  code-defined agents passed via LoadContext.codeAgents, so a
  markdown orchestrator can delegate to a code-defined specialist.
2. Explicit tool record entries — ToolkitEntrys, inline
  FunctionTools, or HostedTools.
3. Auto-inherit (if nothing explicit) — pulls every registered
  ToolProvider plugin's tools whose author marked
  autoInheritable: true. Asymmetric default: markdown agents
  inherit (file: true), code-defined agents don't (code: false).
Mounts POST /api/agents/invocations (OpenAI Responses compatible) +
POST /api/agents/chat, POST /api/agents/cancel,
POST /api/agents/approve, GET /api/agents/threads/:id,
DELETE /api/agents/threads/:id, GET /api/agents/info.
SSE streaming via executeStream. Tool calls dispatch through
PluginContext.executeTool(req, pluginName, localName, args, signal)
for OBO, telemetry, and timeout.
Exposes appkit.agents.{register, list, get, reload, getDefault, getThreads}
runtime helpers.

Human-in-the-loop approval gate

Any tool annotated destructive: true pauses the stream, emits an
appkit.approval_pending SSE event, and waits for a
POST /api/agents/approve decision from the same user who initiated
the run. A missing decision after approval.timeoutMs auto-denies.
Enabled by default (approval.requireForDestructive: true); opt out
for dev. Per-user ownership enforced (x-forwarded-user).

Zero-trust MCP host policy

tools/mcp-host-policy.ts enforces an allowlist on every MCP URL
before the first byte is sent. Same-origin Databricks workspace URLs
are admitted by default; any other host must be explicitly trusted
via agents({ mcp: { trustedHosts: [...] } }). Blocks link-local
(cloud metadata at 169.254/16), RFC1918, CGNAT, loopback,
ULA, multicast, and IPv4-mapped IPv6 equivalents at DNS-resolve
time. Workspace credentials (service-principal on initialize /
tools/list; caller OBO on tools/call) are never attached to
non-workspace hosts.

DoS caps

limits: { maxConcurrentStreamsPerUser, maxToolCalls, maxSubAgentDepth }
(defaults 5 / 50 / 3). Chat bodies are capped at 64k characters via
Zod schema; 6th concurrent stream for the same user returns 429; tool
budget exhaustion aborts the run with a clear error.

`runAgent(def, input)` — standalone executor

packages/appkit/src/core/run-agent.ts. Runs an AgentDefinition
without createApp or HTTP. Drives the adapter's event stream to
completion, executing inline tools + sub-agents along the way.
Aggregates events into { text, events }. Useful for tests, CLI
scripts, and offline pipelines.

Event translation and thread storage

AgentEventTranslator — stateful converter from internal
AgentEvents to OpenAI Responses API ResponseStreamEvents with
strictly monotonic sequence_number and output_index.
InMemoryThreadStore — per-user conversation persistence with
explicit ephemeral: true opt-in on AgentDefinition for
stateless agents (autocomplete, one-shot tools).
buildBaseSystemPrompt + composeSystemPrompt — formats the
AppKit base prompt (with plugin names and tool names) and layers
the agent's instructions on top.

Frontmatter loader

load-agents.ts — reads *.md files, parses YAML frontmatter with
js-yaml, resolves toolkits: [...] entries against the plugin
provider index at load time, wraps ambient tools (from agents({ tools: {...} })) for tools: [...] frontmatter references.
loadAgentsFromDir runs a two-pass resolver so agents: references
can be resolved regardless of file-system iteration order; supports
markdown siblings + code-defined agents (via LoadContext.codeAgents)
with code precedence on collision.

Plumbing

Adds js-yaml + @types/js-yaml deps.
Manifest mounts routes at /api/agents/* (plural — matches the
appkit.agents.* runtime handle).
Exports from the main barrel: agents, createAgent, runAgent,
AgentDefinition, AgentsPluginConfig, AgentTool, ToolkitEntry,
ToolkitOptions, BaseSystemPromptOption, PromptContext,
isToolkitEntry, loadAgentFromFile, loadAgentsFromDir.

Test plan

Loader tests (25): parse errors, toolkits/tools resolution,
agents: sibling resolution regardless of order, mutual delegation,
missing/self/non-array refs, deduplication, loadAgentFromFile
rejection of agents:, markdown → code references, code precedence
on collision.
Approval gate tests: HITL wait + decision, timeout auto-deny,
stream-owner enforcement.
DoS caps tests: body-size rejection, per-user concurrency 429,
tool-call budget, sub-agent depth cap.
Event translator tests: output-index monotonicity,
message-interruption-by-tool-call, undefined-result coalescing.
MCP host policy tests: URL allowlist, IP blocklist, IPv6 link-local
full range, IPv4-mapped colon-hex normalization.
Full appkit vitest suite: 1552 tests passing at stack tip.
Typecheck clean across all 8 workspace projects.

Signed-off-by: MarioCadenas MarioCadenas@users.noreply.github.com

PR Stack

Shared agent types + LLM adapters — feat(appkit): shared agent types and LLM adapter implementations #301
Tool primitives + ToolProvider surfaces — feat(appkit): tool primitives and ToolProvider surfaces on core plugins #302
Plugin infrastructure (attachContext + PluginContext) — feat(appkit): plugin infrastructure — attachContext + PluginContext mediator #303
agents() plugin + createAgent(def) + markdown-driven agents (this PR)
fromPlugin() DX + runAgent plugins arg + toolkit-resolver — feat(appkit): tools(plugins) DX, runAgent plugins arg, shared toolkit-resolver #305
Reference app + dev-playground + docs — feat(appkit): reference agent-app, dev-playground chat UI, docs, and template #306

Demo

agent-demo.mp4

…template Final layer of the agents feature stack. Everything needed to exercise, demonstrate, and learn the feature. ### Reference application: agent-app `apps/agent-app/` — a standalone app purpose-built around the agents feature. Ships with: - `server.ts` — full example of code-defined agents via `fromPlugin`: ```ts const support = createAgent({ instructions: "…", tools: { ...fromPlugin(analytics), ...fromPlugin(files), get_weather, "mcp.vector-search": mcpServer("vector-search", "https://…"), }, }); await createApp({ plugins: [server({ port }), analytics(), files(), agents({ agents: { support } })], }); ``` - `config/agents/assistant.md` — markdown-driven agent alongside the code-defined one, showing the asymmetric auto-inherit default. - Vite + React 19 + TailwindCSS frontend with a chat UI. - Databricks deployment config (`databricks.yml`, `app.yaml`) and deploy scripts. ### dev-playground chat UI + demo agent `apps/dev-playground/client/src/routes/agent.route.tsx` — chat UI with inline autocomplete (hits the `autocomplete` markdown agent) and a full threaded conversation panel (hits the default agent). `apps/dev-playground/server/index.ts` — adds a code-defined `helper` agent using `fromPlugin(analytics)` alongside the markdown-driven `autocomplete` agent in `config/agents/`. Exercises the mixed-style setup (markdown + code) against the same plugin list. `apps/dev-playground/config/agents/*.md` — both agents defined with valid YAML frontmatter. ### Docs `docs/docs/plugins/agents.md` — progressive five-level guide: 1. Drop a markdown file → it just works. 2. Scope tools via `toolkits:` / `tools:` frontmatter. 3. Code-defined agents with `fromPlugin()`. 4. Sub-agents. 5. Standalone `runAgent()` (no `createApp` or HTTP). Plus a configuration reference, runtime API reference, and frontmatter schema table. `docs/docs/api/appkit/` — regenerated typedoc for the new public surface (fromPlugin, runAgent, AgentDefinition, AgentsPluginConfig, ToolkitEntry, ToolkitOptions, all adapter types, and the agents plugin factory). ### Template `template/appkit.plugins.json` — adds the `agent` plugin entry so `npx @databricks/appkit init --features agent` scaffolds the plugin correctly. ### Test plan - Full appkit vitest suite: 1311 tests passing - Typecheck clean across all 8 workspace projects - `pnpm docs:build` clean (no broken links) - `pnpm --filter=@databricks/appkit build:package` clean, publint clean Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com> ### Zero-trust MCP host policy documentation (S1 security) Documents the new `mcp` configuration block and the rules it enforces: same-origin-only by default, explicit `trustedHosts` for external MCP servers, plaintext `http://` refused outside localhost-in-dev, and DNS-level blocking of private / link-local IP ranges (covers cloud metadata services). See PR #302 for the policy implementation and PR #304 for the `AgentsPluginConfig.mcp` wiring. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

…template Final layer of the agents feature stack. Everything needed to exercise, demonstrate, and learn the feature. ### Reference application: agent-app `apps/agent-app/` — a standalone app purpose-built around the agents feature. Ships with: - `server.ts` — full example of code-defined agents via `fromPlugin`: ```ts const support = createAgent({ instructions: "…", tools: { ...fromPlugin(analytics), ...fromPlugin(files), get_weather, "mcp.vector-search": mcpServer("vector-search", "https://…"), }, }); await createApp({ plugins: [server({ port }), analytics(), files(), agents({ agents: { support } })], }); ``` - `config/agents/assistant.md` — markdown-driven agent alongside the code-defined one, showing the asymmetric auto-inherit default. - Vite + React 19 + TailwindCSS frontend with a chat UI. - Databricks deployment config (`databricks.yml`, `app.yaml`) and deploy scripts. ### dev-playground chat UI + demo agent `apps/dev-playground/client/src/routes/agent.route.tsx` — chat UI with inline autocomplete (hits the `autocomplete` markdown agent) and a full threaded conversation panel (hits the default agent). `apps/dev-playground/server/index.ts` — adds a code-defined `helper` agent using `fromPlugin(analytics)` alongside the markdown-driven `autocomplete` agent in `config/agents/`. Exercises the mixed-style setup (markdown + code) against the same plugin list. `apps/dev-playground/config/agents/*.md` — both agents defined with valid YAML frontmatter. ### Docs `docs/docs/plugins/agents.md` — progressive five-level guide: 1. Drop a markdown file → it just works. 2. Scope tools via `toolkits:` / `tools:` frontmatter. 3. Code-defined agents with `fromPlugin()`. 4. Sub-agents. 5. Standalone `runAgent()` (no `createApp` or HTTP). Plus a configuration reference, runtime API reference, and frontmatter schema table. `docs/docs/api/appkit/` — regenerated typedoc for the new public surface (fromPlugin, runAgent, AgentDefinition, AgentsPluginConfig, ToolkitEntry, ToolkitOptions, all adapter types, and the agents plugin factory). ### Template `template/appkit.plugins.json` — adds the `agent` plugin entry so `npx @databricks/appkit init --features agent` scaffolds the plugin correctly. ### Test plan - Full appkit vitest suite: 1311 tests passing - Typecheck clean across all 8 workspace projects - `pnpm docs:build` clean (no broken links) - `pnpm --filter=@databricks/appkit build:package` clean, publint clean Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com> ### Zero-trust MCP host policy documentation (S1 security) Documents the new `mcp` configuration block and the rules it enforces: same-origin-only by default, explicit `trustedHosts` for external MCP servers, plaintext `http://` refused outside localhost-in-dev, and DNS-level blocking of private / link-local IP ranges (covers cloud metadata services). See PR #302 for the policy implementation and PR #304 for the `AgentsPluginConfig.mcp` wiring. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com> ### HITL approval UI + SQL safety docs (S2 security, Layer 1-3) - `docs/docs/plugins/agents.md`: new "SQL agent tools" subsection covering `analytics.query` readOnly enforcement, `lakebase.query` opt-in via `exposeAsAgentTool`, and the approval flow. New "Human-in-the-loop approval for destructive tools" subsection documents the config, SSE event shape, and `POST /chat/approve` contract. - `apps/agent-app`: approval-card component rendered inline in the chat stream whenever an `appkit.approval_pending` event arrives. Destructive badge + Approve/Deny buttons POST to `/api/agent/approve` with the carried `streamId`/`approvalId`. - `apps/dev-playground/client`: matching approval-card on the agent route, using the existing appkit-ui `Button` component and Tailwind utility classes. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

…template Final layer of the agents feature stack. Everything needed to exercise, demonstrate, and learn the feature. ### Reference application: agent-app `apps/agent-app/` — a standalone app purpose-built around the agents feature. Ships with: - `server.ts` — full example of code-defined agents via `fromPlugin`: ```ts const support = createAgent({ instructions: "…", tools: { ...fromPlugin(analytics), ...fromPlugin(files), get_weather, "mcp.vector-search": mcpServer("vector-search", "https://…"), }, }); await createApp({ plugins: [server({ port }), analytics(), files(), agents({ agents: { support } })], }); ``` - `config/agents/assistant.md` — markdown-driven agent alongside the code-defined one, showing the asymmetric auto-inherit default. - Vite + React 19 + TailwindCSS frontend with a chat UI. - Databricks deployment config (`databricks.yml`, `app.yaml`) and deploy scripts. ### dev-playground chat UI + demo agent `apps/dev-playground/client/src/routes/agent.route.tsx` — chat UI with inline autocomplete (hits the `autocomplete` markdown agent) and a full threaded conversation panel (hits the default agent). `apps/dev-playground/server/index.ts` — adds a code-defined `helper` agent using `fromPlugin(analytics)` alongside the markdown-driven `autocomplete` agent in `config/agents/`. Exercises the mixed-style setup (markdown + code) against the same plugin list. `apps/dev-playground/config/agents/*.md` — both agents defined with valid YAML frontmatter. ### Docs `docs/docs/plugins/agents.md` — progressive five-level guide: 1. Drop a markdown file → it just works. 2. Scope tools via `toolkits:` / `tools:` frontmatter. 3. Code-defined agents with `fromPlugin()`. 4. Sub-agents. 5. Standalone `runAgent()` (no `createApp` or HTTP). Plus a configuration reference, runtime API reference, and frontmatter schema table. `docs/docs/api/appkit/` — regenerated typedoc for the new public surface (fromPlugin, runAgent, AgentDefinition, AgentsPluginConfig, ToolkitEntry, ToolkitOptions, all adapter types, and the agents plugin factory). ### Template `template/appkit.plugins.json` — adds the `agent` plugin entry so `npx @databricks/appkit init --features agent` scaffolds the plugin correctly. ### Test plan - Full appkit vitest suite: 1311 tests passing - Typecheck clean across all 8 workspace projects - `pnpm docs:build` clean (no broken links) - `pnpm --filter=@databricks/appkit build:package` clean, publint clean Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com> ### Zero-trust MCP host policy documentation (S1 security) Documents the new `mcp` configuration block and the rules it enforces: same-origin-only by default, explicit `trustedHosts` for external MCP servers, plaintext `http://` refused outside localhost-in-dev, and DNS-level blocking of private / link-local IP ranges (covers cloud metadata services). See PR #302 for the policy implementation and PR #304 for the `AgentsPluginConfig.mcp` wiring. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com> ### HITL approval UI + SQL safety docs (S2 security, Layer 1-3) - `docs/docs/plugins/agents.md`: new "SQL agent tools" subsection covering `analytics.query` readOnly enforcement, `lakebase.query` opt-in via `exposeAsAgentTool`, and the approval flow. New "Human-in-the-loop approval for destructive tools" subsection documents the config, SSE event shape, and `POST /chat/approve` contract. - `apps/agent-app`: approval-card component rendered inline in the chat stream whenever an `appkit.approval_pending` event arrives. Destructive badge + Approve/Deny buttons POST to `/api/agent/approve` with the carried `streamId`/`approvalId`. - `apps/dev-playground/client`: matching approval-card on the agent route, using the existing appkit-ui `Button` component and Tailwind utility classes. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com> ### Auto-inherit posture documentation (S3 security, Layer 3) Updates `docs/docs/plugins/agents.md` to document the new two-key auto-inherit model introduced in PR #302 (per-tool `autoInheritable` flag) and PR #304 (safe-by-default `autoInheritTools: { file: false, code: false }`). Adds an "Auto-inherit posture" subsection explaining that the developer must opt into `autoInheritTools` AND the plugin author must mark each tool `autoInheritable: true` for a tool to spread without explicit wiring. Includes a table documenting the `autoInheritable` marking on each core plugin tool, plus an example of the setup-time audit log so operators can see exactly what's inherited vs. skipped. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

…template Final layer of the agents feature stack. Everything needed to exercise, demonstrate, and learn the feature. `apps/agent-app/` — a standalone app purpose-built around the agents feature. Ships with: - `server.ts` — full example of code-defined agents via `fromPlugin`: ```ts const support = createAgent({ instructions: "…", tools: { ...fromPlugin(analytics), ...fromPlugin(files), get_weather, "mcp.vector-search": mcpServer("vector-search", "https://…"), }, }); await createApp({ plugins: [server({ port }), analytics(), files(), agents({ agents: { support } })], }); ``` - `config/agents/assistant.md` — markdown-driven agent alongside the code-defined one, showing the asymmetric auto-inherit default. - Vite + React 19 + TailwindCSS frontend with a chat UI. - Databricks deployment config (`databricks.yml`, `app.yaml`) and deploy scripts. `apps/dev-playground/client/src/routes/agent.route.tsx` — chat UI with inline autocomplete (hits the `autocomplete` markdown agent) and a full threaded conversation panel (hits the default agent). `apps/dev-playground/server/index.ts` — adds a code-defined `helper` agent using `fromPlugin(analytics)` alongside the markdown-driven `autocomplete` agent in `config/agents/`. Exercises the mixed-style setup (markdown + code) against the same plugin list. `apps/dev-playground/config/agents/*.md` — both agents defined with valid YAML frontmatter. `docs/docs/plugins/agents.md` — progressive five-level guide: 1. Drop a markdown file → it just works. 2. Scope tools via `toolkits:` / `tools:` frontmatter. 3. Code-defined agents with `fromPlugin()`. 4. Sub-agents. 5. Standalone `runAgent()` (no `createApp` or HTTP). Plus a configuration reference, runtime API reference, and frontmatter schema table. `docs/docs/api/appkit/` — regenerated typedoc for the new public surface (fromPlugin, runAgent, AgentDefinition, AgentsPluginConfig, ToolkitEntry, ToolkitOptions, all adapter types, and the agents plugin factory). `template/appkit.plugins.json` — adds the `agent` plugin entry so `npx @databricks/appkit init --features agent` scaffolds the plugin correctly. - Full appkit vitest suite: 1311 tests passing - Typecheck clean across all 8 workspace projects - `pnpm docs:build` clean (no broken links) - `pnpm --filter=@databricks/appkit build:package` clean, publint clean Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com> Documents the new `mcp` configuration block and the rules it enforces: same-origin-only by default, explicit `trustedHosts` for external MCP servers, plaintext `http://` refused outside localhost-in-dev, and DNS-level blocking of private / link-local IP ranges (covers cloud metadata services). See PR #302 for the policy implementation and PR #304 for the `AgentsPluginConfig.mcp` wiring. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com> - `docs/docs/plugins/agents.md`: new "SQL agent tools" subsection covering `analytics.query` readOnly enforcement, `lakebase.query` opt-in via `exposeAsAgentTool`, and the approval flow. New "Human-in-the-loop approval for destructive tools" subsection documents the config, SSE event shape, and `POST /chat/approve` contract. - `apps/agent-app`: approval-card component rendered inline in the chat stream whenever an `appkit.approval_pending` event arrives. Destructive badge + Approve/Deny buttons POST to `/api/agent/approve` with the carried `streamId`/`approvalId`. - `apps/dev-playground/client`: matching approval-card on the agent route, using the existing appkit-ui `Button` component and Tailwind utility classes. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com> Updates `docs/docs/plugins/agents.md` to document the new two-key auto-inherit model introduced in PR #302 (per-tool `autoInheritable` flag) and PR #304 (safe-by-default `autoInheritTools: { file: false, code: false }`). Adds an "Auto-inherit posture" subsection explaining that the developer must opt into `autoInheritTools` AND the plugin author must mark each tool `autoInheritable: true` for a tool to spread without explicit wiring. Includes a table documenting the `autoInheritable` marking on each core plugin tool, plus an example of the setup-time audit log so operators can see exactly what's inherited vs. skipped. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com> - **Reference app no longer ships hardcoded dogfood URLs.** The three `https://e2-dogfood.staging.cloud.databricks.com/...` and `https://mario-mcp-hello-*.staging.aws.databricksapps.com/...` MCP URLs in `apps/agent-app/server.ts` are replaced with optional env-driven `VECTOR_SEARCH_MCP_URL` / `CUSTOM_MCP_URL` config. When set, their hostnames are auto-added to `agents({ mcp: { trustedHosts } })`. `.env.example` uses placeholder values the reader can replace instead of another team's workspace. - **`appkit.agent` → `appkit.agents` in the reference app.** The prior `appkit.agent as { list, getDefault }` cast papered over the plugin-name mismatch fixed in PR #304. The runtime key now matches the docs, the manifest, and the factory name; the cast is gone. - **Auto-inherit opt-in added to the reference config.** Since the defaults flipped to `{ file: false, code: false }` (PR #304, S-3), the reference now explicitly enables `autoInheritTools: { file: true }` so the markdown agents that ship alongside the code-defined one still pick up the analytics / files read-only tools. This is the pattern a real deployment should follow — opt in deliberately. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com> - `apps/dev-playground/config/agents/autocomplete.md` sets `ephemeral: true`. Each debounced autocomplete keystroke no longer leaves an orphan thread in `InMemoryThreadStore` — the server now deletes the thread in the stream's `finally` (PR #304). Closes R1 from the MVP re-review. - `docs/docs/plugins/agents.md` documents the new `ephemeral` frontmatter key alongside the other AgentDefinition knobs. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com> Documents the MVP resource caps landed in PR #304: the static request-body caps (enforced by the Zod schemas) and the three configurable runtime limits (`maxConcurrentStreamsPerUser`, `maxToolCalls`, `maxSubAgentDepth`). Includes the config-block shape in the main reference and a new "Resource limits" subsection under the Configuration section explaining the intent and per-user semantics of each cap. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

…agents The main product layer. Turns an AppKit app into an AI-agent host with markdown-driven agent discovery, code-defined agents, sub-agents, and a standalone run-without-HTTP executor. Agent runtime files land in core/agent/ from day one: core/agent/create-agent.ts — createAgent() definition factory core/agent/run-agent.ts — standalone adapter loop (no HTTP) core/agent/load-agents.ts — markdown agent discovery core/agent/system-prompt.ts — base system prompt + composition core/agent/types.ts — updated with AgentDefinition, AgentsPluginConfig, RegisteredAgent, etc. HTTP-facing concerns stay in plugins/agents/: agents.ts, thread-store.ts, tool-approval-gate.ts, event-channel.ts, event-translator.ts, schemas.ts, defaults.ts, manifest.json

Tool-agnostic guidelines instead of SQL/files-specific defaults; accept full PromptContext in buildBaseSystemPrompt for parity with custom callbacks. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Register DATABRICKS_SERVING_ENDPOINT_NAME as optional CAN_QUERY so apps using Databricks-hosted agent models get resource wiring; optional when agents use only external adapters. Sync template/appkit.plugins.json. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Align optional serving resource with `DatabricksAdapter.fromModelServing()`, which reads `DATABRICKS_AGENT_ENDPOINT` — not `DATABRICKS_SERVING_ENDPOINT_NAME` (serving plugin). Sync template. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Top-level config/agents/*.md is no longer loaded. Use <agentId>/agent.md. The skills directory name is reserved and skipped. Orphan top-level .md files error at load; subdirs without agent.md error. Export agentIdFromMarkdownPath for path-based id resolution. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

The MCP transport client and host policy aren't agents-specific; they are HTTP + JSON-RPC transport with URL/DNS allowlisting. Move them under packages/appkit/src/connectors/mcp/ so they sit alongside the other transport-layer modules (serving, genie, sql-warehouse, lakebase, …) and stop being reachable only through the agents plugin. - Move mcp-client.ts -> connectors/mcp/client.ts - Move mcp-host-policy.ts -> connectors/mcp/host-policy.ts - Move McpEndpointConfig type -> connectors/mcp/types.ts - Add connectors/mcp/index.ts barrel; re-export from connectors/index.ts - Move mcp-client / mcp-host-policy tests to connectors/mcp/tests/ - Agents plugin keeps hosted-tools.ts (HostedTool sugar + resolve) and imports connector types from ../../connectors/mcp. - tools/ barrel no longer re-exports AppKitMcpClient (never was public). No behaviour change. All existing tests pass against the new paths.

…dispatchToolCall Three small helpers pulled out of the AgentsPlugin streaming path to cut duplication and shrink the two large methods. - normalize-result.ts: void->"", JSON-stringify, 50K truncation with a human-readable marker. Unit-testable (previously covered only via the HTTP path). - consume-adapter-stream.ts: the 'message_delta' + 'message' accumulation loop shared between _streamAgent and runSubAgent. Accepts an optional signal and per-event side-effect callback (for SSE translation). - tool-dispatch.ts: one place that fans out toolkit/function/mcp/subagent entries. 'never'-typed default forces exhaustiveness: adding a fifth source is now a compile error at every call site. _streamAgent: executeTool closure shrinks from ~60 lines of dispatch + normalize to a single dispatchToolCall + normalizeToolResult call. Stream consumption collapses to consumeAdapterStream. runSubAgent: childExecute shrinks from ~30 lines of if/else dispatch to one dispatchToolCall call. Adapter loop collapses to consumeAdapterStream. Behaviour change (minor): childExecute previously silently fell through to 'Unsupported sub-agent tool source' when mcpClient or PluginContext was missing; now it throws the same specific error as the main stream. Matches the main-path behaviour. Tests: 15 new unit tests for normalizeToolResult + consumeAdapterStream. dispatchToolCall is exercised transitively through the full agent suite (288 existing tests still pass, 303 total on this branch).

… → def The `annotations` field (notably `destructive: true`) was silently dropped as tools flowed from `tool({...})` into the resolved `AgentToolDefinition`, so user-defined destructive tools never triggered the approval gate. - `ToolConfig` now accepts `annotations?: ToolAnnotations`. - `tool()` forwards it to the returned `FunctionTool`. - `FunctionTool` exposes `annotations` and `functionToolToDefinition` preserves it on the definition it builds. - `AgentsPlugin` reads the flag via `isDestructiveToolEntry()` (falls back to `functionTool.annotations` so a future divergence between def and function cannot re-introduce the bug) and emits the merged annotations via `combinedToolAnnotations()` on the `approval_pending` SSE payload. Covered by `tests/tool-approval-gate.test.ts` and `tests/function-tool.test.ts`.

ToolAnnotations.destructive is binary and has started to mislead: "save_view" captures a screenshot and creates a new file, which is nothing like deleting a dashboard, yet both trip the same red "destructive" approval card. This adds a semantic `effect` enum with four tiers — `read`, `write`, `update`, `destructive` — so tool authors can tell the UI what blast radius they actually have. The approval gate fires for any mutating effect (`write`/`update`/ `destructive`) and continues to honour the legacy `destructive: true` flag so existing tools keep their current red treatment without migration. Callers consuming `annotations` over the wire (MCP clients, approval UIs) can now differentiate; the playground will ship a tiered approval card as a follow-up.

Biome import collapsing on agent loader, run-agent, and tests after rebasing onto main. Lockfile and synced plugin manifest reflect the current main state (including get-port from #349 already on main). Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

PR #304 agentic review applied (P1 + cheap P2): - Approval gate now honours the modern `effect` field (write/update/ destructive), matching the documented contract on ToolAnnotations. Previously a tool authored with `effect: "destructive"` and no legacy `destructive: true` boolean bypassed the gate. - Sub-agent tool calls share the parent's RunState, so the per-run tool-call budget and the destructive-tool approval gate apply to nested sub-agent calls — not only the top-level adapter. - /invocations enforces `maxConcurrentStreamsPerUser`. Without this a client could bypass the cap by switching from /chat to /invocations. - /cancel uses a Zod schema instead of a raw `as` cast, matching the validation pattern of the sibling routes. - agents.reload() builds the registry into a fresh Map and only swaps on success. A malformed markdown file no longer wipes the live registry and breaks in-flight requests. - consumeAdapterStream and normalizeToolResult are wired into the four call sites that previously inlined the same accumulation / serialization logic (top-level _streamAgent, runSubAgent, dispatchToolCall, runAgent). - load-agents.ts uses fs.promises so reload() does not block the event loop. - Pin js-yaml to 4.1.1 (drop the caret), matching the rest of appkit's pinned deps. Tests cover the approval-gate `effect` matrix (destructive/write/update trigger; read/undefined skip), the deny path, the shared budget across top-level + sub-agent dispatches, and the /invocations rate-limit gate. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

After dispatchToolCall consolidated the sub-agent path, runSubAgent called consumeAdapterStream without forwarding the child's adapter events. The parent stream therefore only emitted the outer agent-<name> function call; nested tool_call / tool_result events from the sub-agent never reached the client. The smart-dashboard query agent delegates to dashboard_pilot whose UI- action tools (apply_filter, highlight_period, focus_chart, etc.) rely on the SSE stream to apply React state mutations. Without the forwarding, the user asks for a highlight and nothing visible happens. Forward every sub-agent event except metadata. Sub-agents have their own threadId; emitting it would overwrite the parent's thread state on the client and break multi-turn continuity. Test exercises the metadata-skipping rule and asserts tool_call, tool_result, and message_delta all reach outboundEvents. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

MarioCadenas · 2026-05-07T16:01:32Z

Addressed P1 + cheap P2 findings in 233b388 and the cascaded sub-agent SSE forwarding in e365bda:

1 approval gate honours effect (write/update/destructive) → requiresApproval() in agents.ts:82
2 / 3 sub-agent budget + approval → shared RunState + dispatchToolCall (called by both _streamAgent and runSubAgent)
4 /invocations rate-limit guard → mirrors the /chat check in _handleInvocations
5 consumeAdapterStream / normalizeToolResult wired into all 4 call sites (top-level _streamAgent, dispatchToolCall, runSubAgent, standalone runAgent)
6 atomic reload() via buildAgentRegistry() (builds into a fresh Map, swaps on success)
7 Zod cancelRequestSchema in _handleCancel, drops the as cast
10 fs/promises in load-agents.ts so reload() doesn't block the event loop

Deferred (advisory):

8 hardcoded executeTool timeout — lives in core/plugin-context.ts, separate PR
9 tool result type inconsistency (short vs long) — intentional, preserves structured data for short results
11 printRegistry() uses console.log — intentional picocolors-styled startup banner

Happy to file follow-up issues for any of the deferred items if useful.

Inline replies on 1 (manifest hide) and 2 (beta rename) — left rationale on the threads; let me know if you'd prefer different framing. 5 (event-channel external lib) — kept the zero-dep 70-line implementation per the "defensible" note in your comment; happy to swap to @repeaterjs/repeater if you'd rather.

Tests added: dispatch-tool-call.test.ts covers the approval-gate effect matrix, the deny path, the shared budget across top-level + sub-agent dispatches, and the sub-agent event-forwarding contract. Plus /invocations rate-limit test in dos-limits.test.ts.

… string) Address the two PR-304 P2 findings deferred from the first round: #8 — Hardcoded 30s timeout in PluginContext.executeTool truncated legitimate cold SQL Warehouse and long Genie tool calls. Add agents({ limits: { toolCallTimeoutMs } }) (default 5 minutes), plumb through RunState into PluginContext.executeTool which now accepts an optional timeoutMs (also defaulting to 300_000 for non-agents callers). #9 — normalizeToolResult returned the raw object for short non-string results and a string for truncated results. Every downstream consumer stringified at the wire boundary anyway, so the asymmetry just complicated the type signature. Always return string and JSON-stringify null/objects; the existing defensive typeof === "string" ? : JSON.stringify(...) at adapter and translator sites still handles non-AppKit-mediated results. #11 (printRegistry uses console.log) is left as-is — intentional picocolors-styled startup banner; logger prefix would break the column alignment and bypass log-level filtering. Tests cover the new toolCallTimeoutMs default (300_000), the override path, the runState → context plumbing, and the always-string contract on normalizeToolResult including the null → "null" change. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

MarioCadenas · 2026-05-07T16:15:24Z

Follow-up: re-reviewed the deferred items and landed two of them in 4898f8cc.

8 Hardcoded 30s executeTool timeout → on closer look this was actually a latent bug, not just a maintainability concern: analytics SQL warehouse cold-starts and long Genie conversations routinely exceed 30s. Added agents({ limits: { toolCallTimeoutMs } }) (default 300_000 = 5 min), plumbed through RunState to PluginContext.executeTool (which now accepts an optional timeoutMs defaulting to the same 5 min). Tests cover the default, the override, and the runState → context plumbing.
9 Tool result type inconsistency → after tracing the consumers, every wire-boundary site (adapter, translator) was already stringifying via typeof === "string" ? : JSON.stringify(...), so the "preserves structured data for short results" rationale didn't hold. normalizeToolResult now returns string always; null becomes the literal "null" (matches JSON.stringify(null)). Adapter/translator defensive code stays for non-AppKit-mediated executeTool callbacks.
11 printRegistry() console.log → kept as-is, intentional picocolors-styled startup banner. Logger prefix would break column alignment and bypass log-level filtering. Could add a JSDoc note explaining the intent if you want it codified.

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from a5642df to e26795b Compare April 21, 2026 20:41

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from 3c7c35e to cb7fe2b Compare April 21, 2026 20:41

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from e26795b to d73e138 Compare April 22, 2026 08:45

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from cb7fe2b to 0afea5e Compare April 22, 2026 08:45

MarioCadenas mentioned this pull request Apr 22, 2026

feat(appkit): zero-trust MCP host policy with URL allowlist and scoped auth #307

Closed

7 tasks

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from 0afea5e to 983461c Compare April 22, 2026 09:24

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from 983461c to a7b0444 Compare April 22, 2026 09:46

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from a7b0444 to 623792d Compare April 22, 2026 09:59

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch 2 times, most recently from 1b72080 to a6567bc Compare April 29, 2026 18:36

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from 5bf6b22 to 863439e Compare April 29, 2026 18:36

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from 863439e to cca914f Compare April 29, 2026 20:18

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from a6567bc to 5d0fae2 Compare April 29, 2026 20:18

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from 5d0fae2 to af9b6ee Compare May 4, 2026 09:22

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from cca914f to a3d2cc6 Compare May 4, 2026 09:22

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from a3d2cc6 to f495962 Compare May 4, 2026 09:41

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from af9b6ee to e4b1322 Compare May 4, 2026 09:41

pkosiec reviewed May 6, 2026

View reviewed changes

MarioCadenas added 12 commits May 7, 2026 11:45

refactor(appkit): generalize default base system prompt

3107741

Tool-agnostic guidelines instead of SQL/files-specific defaults; accept full PromptContext in buildBaseSystemPrompt for parity with custom callbacks. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

pkosiec approved these changes May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(appkit): agents() plugin, createAgent(def), and markdown-driven agents#304

feat(appkit): agents() plugin, createAgent(def), and markdown-driven agents#304
MarioCadenas merged 13 commits intomainfrom
agent/v2/4-agents-plugin

MarioCadenas commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarioCadenas commented May 7, 2026 •

edited

Loading

Uh oh!

MarioCadenas commented May 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MarioCadenas commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

createAgent(def) — pure factory

agents() plugin

Human-in-the-loop approval gate

Zero-trust MCP host policy

DoS caps

runAgent(def, input) — standalone executor

Event translation and thread storage

Frontmatter loader

Plumbing

Test plan

PR Stack

Demo

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarioCadenas commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarioCadenas commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MarioCadenas commented Apr 21, 2026 •

edited

Loading

`createAgent(def)` — pure factory

`agents()` plugin

`runAgent(def, input)` — standalone executor

MarioCadenas commented May 7, 2026 •

edited

Loading

MarioCadenas commented May 7, 2026 •

edited

Loading