feat(agents): add /goal slash command with token-budget enforcement by kevin-dp · Pull Request #4552 · electric-sql/electric

kevin-dp · 2026-06-10T10:11:12Z

Summary

Adds a /goal slash command to Horton sessions. The user sets an objective with an optional token cap; the agent works autonomously toward it and stops when mark_goal_complete is called or when the run exceeds the budget. Budget enforcement is mid-run (via an onStepEnd hook on the outbound bridge), not at run boundaries — so a goal with a small budget actually halts the agent within a step, rather than letting it finish whatever giant tool-loop is in flight.

/goal set "ship feature X" --tokens 50k   # default 50k tokens
/goal set "explore" --unlimited           # opt out of the cap
/goal show                                 # current state
/goal complete                             # mark done manually
/goal clear                                # remove the goal

Behaviour

One goal per session, persisted as a kind: 'goal' entry on the manifests collection — resumes across desktop restarts via the existing Electric sync, no schema migration.
Mid-run token enforcement: an onStepEnd hook on the outbound bridge surfaces per-step input/output tokens; Horton accumulates them and aborts ctx.agent.run() via an AbortController once tokensUsed >= tokenBudget. The cap covers the sum of input + output across all steps since the goal was set.
Live progress: the goal banner ticks up after each step. The manifest update is written via writeEvent directly (not the wake-session's staged manifest transaction, which only commits at end-of-wake — too late for a long-running run).
mark_goal_complete tool: registered on Horton's tool list; flips status to complete. The chat reply renders via the new ctx.replyText helper, which synthesizes a complete runs + texts + textDeltas sequence.
State-changing /goal commands interrupt the active run — /goal complete, /goal clear, and /goal set typed while a run is in flight signal SIGINT alongside sending the message, so the prior run aborts instead of finishing its old work first. /goal show is read-only and never interrupts.
Budget-limited stop message: when the cap is hit mid-run, the agent posts a synthetic reply explaining what happened (with a suggested larger budget) instead of just falling silent.

Plumbing

entity-schema.ts — new ManifestGoalEntryValue (objective, status, tokenBudget, tokensUsed, tokensAtCreation, createdAt, updatedAt) added to the manifest discriminated union.
goal-api.ts (new) — setGoal / clearGoal / getGoal / markGoalComplete / markGoalBudgetLimited / updateGoalUsage / refreshGoalUsage. updateGoalUsage writes the manifest update directly through writeEvent for live UI; refreshGoalUsage never decreases tokensUsed so a stale collection sum can't clobber an authoritative in-memory value.
goal-command.ts (new) — /goal parser (--tokens N|50k|1.2m|unlimited, --unlimited flag, subcommand aliases done/status) and dispatcher.
tools/goal-tools.ts (new) — createMarkGoalCompleteTool exposes the completion signal to the LLM.
outbound-bridge.ts — new optional OutboundBridgeHooks.onStepEnd callback, threaded through pi-adapter and the AgentConfig passed to useAgent.
context-factory.ts — AgentHandle.run now accepts an optional abortSignal and combines it with the runtime's runSignal. New ctx.replyText(text) writes a complete runs+texts+textDeltas sequence so synthetic replies render in the chat. New goal-related methods exposed on HandlerContext.
horton.ts — tryHandleSlashCommand intercepts /goal * before the LLM; /goal set enqueues a one-shot kickoff so the agent starts immediately; assistantHandler wires the budget-enforcing onStepEnd, aborts on overflow, and posts the explanation reply.
agents-server-ui — new GoalBanner component above the timeline (objective + budget bar + status badge). MessageInput aborts the active run when a state-changing /goal command is submitted. EntityTimeline / EntityContextDrawer handle the new goal manifest kind.

Stacking

Branched off kevin/agent-token-usage (#4502) since this depends on the persisted per-step input_tokens / output_tokens columns from that PR. Base will retarget to main when #4502 merges.

Test plan

Followups (deferred)

The /goal set kickoff message (Start working toward the active goal now...) is currently visible in the chat as a self-sent inbox message. Could be filtered from the LLM's view or styled differently.
Dollar-cost budgets (instead of raw tokens) would be more intuitive but require provider-specific pricing tables.
Goose-style "verify the goal is met before stopping" nudge is not implemented; the model just calls mark_goal_complete when it decides it's done.

🤖 Generated with Claude Code

github-actions · 2026-06-10T10:12:00Z

Electric Agents Desktop Builds

Build artifacts for commit 0c9d86c.

Platform	Status	Artifact
macOS Apple Silicon	Passed	DMG
macOS Intel	Passed	DMG
Windows x64	Passed	Installer
Linux x64	Passed	AppImage / deb

Workflow run

codecov · 2026-06-10T10:12:49Z

Codecov Report

❌ Patch coverage is 72.15909% with 196 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.31%. Comparing base (b8875a2) to head (0c9d86c).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
packages/agents-runtime/src/context-factory.ts	30.68%	61 Missing ⚠️
packages/agents-runtime/src/tools/goal-tools.ts	4.34%	44 Missing ⚠️
...ges/agents-server-ui/src/components/GoalBanner.tsx	0.00%	39 Missing ⚠️
packages/agents/src/agents/horton.ts	77.94%	15 Missing ⚠️
...agents-server-ui/src/components/EntityTimeline.tsx	0.00%	12 Missing ⚠️
packages/agents-runtime/src/goal-command.ts	96.69%	7 Missing ⚠️
...s/agents-server-ui/src/components/MessageInput.tsx	0.00%	6 Missing ⚠️
packages/agents-runtime/src/outbound-bridge.ts	86.20%	4 Missing ⚠️
...s-server-ui/src/components/EntityContextDrawer.tsx	0.00%	4 Missing ⚠️
packages/agents-runtime/src/goal-api.ts	98.22%	3 Missing ⚠️
... and 1 more

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #4552       +/-   ##
===========================================
- Coverage   72.77%   58.31%   -14.47%     
===========================================
  Files          86      373      +287     
  Lines        9779    41150    +31371     
  Branches     2982    11681     +8699     
===========================================
+ Hits         7117    23997    +16880     
- Misses       2608    17078    +14470     
- Partials       54       75       +21

Flag	Coverage Δ
packages/agents	`71.92% <77.94%> (?)`
packages/agents-mcp	`77.54% <ø> (?)`
packages/agents-mobile	`75.49% <ø> (?)`
packages/agents-runtime	`82.01% <79.13%> (?)`
packages/agents-server	`74.81% <ø> (-0.03%)`	⬇️
packages/agents-server-ui	`6.22% <0.00%> (?)`
packages/electric-ax	`46.42% <ø> (ø)`
packages/experimental	`87.73% <ø> (ø)`
packages/react-hooks	`86.48% <ø> (ø)`
packages/start	`82.83% <ø> (?)`
packages/typescript-client	`91.71% <ø> (ø)`
packages/y-electric	`56.05% <ø> (ø)`
typescript	`58.31% <72.15%> (-14.47%)`	⬇️
unit-tests	`58.31% <72.15%> (-14.47%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-06-10T10:15:36Z

Electric Agents Mobile Build

Local mobile checks ran for commit 0c9d86c.

The EAS Android preview build was skipped because the mobile-eas-build label is not present.
Add the mobile-eas-build label to this PR to produce an installable preview build.

Workflow run

kevin-dp · 2026-06-10T11:09:11Z

CI failures investigation

Three test jobs went red on the first push. After investigation:

Fixed in this PR (commits fc052121a, a046b12be):

Test packages/agents — two existing tests (horton-tool-composition.test.ts, horton-model-selection.test.ts) build a fake ctx and pass it to assistantHandler. The handler now calls ctx.getGoal() at the top, so the stubs needed those goal-related methods present. Fixed by adding getGoal: vi.fn(() => undefined) (and matching updateGoalUsage / markGoalBudgetLimited no-ops) to the test stubs. ✅ Now passing.

Not introduced by this PR (pre-existing on the base branch):

Test packages/agents-runtime — 92/92 spawn failures with Principal is not allowed to spawn X errors against runtime-dsl.test.ts.
Test packages/agents-server — 15 failures including this.registry.replaceSharedStateLink is not a function, ctx.entityManager.registry.hasEntityPermission is not a function, and various 500s in dispatch-policy-routing.test.ts.

These reproduce on the base branch (kevin/agent-token-usage) locally — same exact 15 failures in agents-server. The base's CI run on 2026-06-08 happened to pass, but the failures appear to be test-infra flake (the failing tests use mocked registries that are missing methods the code now calls, or rely on a clean test backend that may have drifted).

My diff to packages/agents-server/ is empty (git diff origin/kevin/agent-token-usage...HEAD -- packages/agents-server/ returns nothing), so this PR can't be the cause of those failures. They likely need a separate fix at the test-infra or test-stubs level on the base branch / main.

msfstef · 2026-06-10T13:25:35Z

Claude Code Review

Summary

Reviewed the /goal slash command + token-budget enforcement end to end (parser, goal API, enforcement wiring, UI). Solid feature overall: the parser/dispatcher split is clean and well-tested, the never-decrease guards on usage writes are thoughtful, and mid-run abort via a combined AbortSignal is the right mechanism. There's one logic bug that looks blocking, one probable race, and a handful of smaller items. All findings below were adversarially re-verified against the branch before posting.

Critical

1. Budget enforcement is gated on goal existence, not `status === 'active'`

horton.ts:768–798 — activeGoalPromptInfo correctly checks activeGoal.status === 'active', but the enforcement hook is gated on mere truthiness: const onStepEnd = activeGoal ? ... : undefined, with runTokensUsed seeded from activeGoal.tokensUsed. The only other status guard in the flow is on kickoffGoalRun (horton.ts:554). Consequences:

After a goal hits budget_limited, it stays in the manifest with tokensUsed >= tokenBudget. Every subsequent normal chat turn wires onStepEnd, starts already over budget, trips on the first step, aborts the run, and re-posts the "⚠️ Stopped" reply — chat is effectively dead until /goal clear.
A goal marked complete keeps accumulating tokensUsed from unrelated runs (updateGoalUsage at horton.ts:789, 836 has no status guard either — the banner keeps ticking on a "complete" goal), and once it crosses the budget, the post-run updateGoalUsage(..., { status: 'budget_limited' }) flips a completed goal back to budget_limited and starts aborting runs.

Fix is small: derive one enforcedGoal = activeGoal?.status === 'active' ? activeGoal : undefined and use it for both prompt info and enforcement. A "goal complete → next run unaffected" test would have caught this (see test note below).

2. `replyText` can collide with the outbound bridge on `run-N` keys

context-factory.ts:482–496 — replyText allocates keys via nextRunKey(), which lazily seeds recordRunCounter from db.collections.runs.toArray. But writeEvent is producer.append (process-wake.ts:480) with no synchronous local apply — collections only update when events round-trip via SSE (process-wake.ts:1043); the comment at horton.ts:833–835 notes this for steps. The budget-stop writeSlashCommandReply (horton.ts:844) fires immediately after the aborted ctx.agent.run() with no drain in between, so the run the bridge just created may not be in the collection yet and replyText can emit a runs.insert reusing the same run-N key. Timing-dependent, but real. The bridge already solves this exact problem with outboundIdSeed — nextRunKey should consult the same seed (or the two should share a counter).

Important

3. Budget burns at cache-read weight — the 50k default is likely unusable

The pi-adapter change (correct for display) sums input + cacheRead + cacheWrite into input_tokens, and the budget accumulates that per step. On warm turns cacheRead includes the full prompt, so every step re-counts the entire conversation — a session with ~20k of context exhausts DEFAULT_TOKEN_BUDGET = 50_000 in 2–3 steps. Display and enforcement share one number but want different things; consider counting only uncached input + output toward the budget (closer to actual cost), or a much larger default.

4. Two unreconciled write paths for the goal manifest key

Per-step updateGoalUsage writes via raw writeEvent (goal-api.ts:208–216), while end-of-wake refreshGoalUsage (process-wake.ts:2156) reads possibly-stale collections and writes via the staged wake-session transaction — which commits last. The never-decrease guards defend tokensUsed, but nothing reconciles the two channels for the other fields. Same round-trip-lag fragility class as #1/#2; worth at least a comment explaining the ordering, or funneling both through one path.

5. Dead API: `markGoalBudgetLimited`

Defined (goal-api.ts:164), exposed on HandlerContext (types.ts:1056, context-factory.ts:994), stubbed in two tests — never called. Horton uses updateGoalUsage(..., { status: 'budget_limited' }) instead. Use it or drop it.

6. `/goal` bypasses the existing slash-command registry

The machinery already exists: SlashCommandDefinition (composer-input.ts:69), published via slash_commands (create-handler.ts:506), and horton already registers skill commands this way (horton.ts:950). /goal isn't registered, so no autocomplete/discovery, and serializeComposerInput flags it unknown (text still arrives intact, so it works — but it's invisible to users).

7. UI re-implements the command parser

MessageInput.isAbortingGoalCommand hand-rolls the /goal prefix check + subcommand parsing (hardcoding the done alias). Root cause: goal-command.ts is only exported from the main index.ts, not the /client entry the UI imports from. The parser is pure — export it from client.ts and reuse it, otherwise runtime dispatch and UI abort behavior drift the next time an alias is added.

8. `mark_goal_complete`'s `summary` is documented as recorded but discarded

goal-tools.ts — execute ignores _params entirely, markGoalComplete() only flips status, and ManifestGoalEntryValue has no summary field. The system prompt also tells the model to use a summary when blocked. Either persist it or fix both descriptions — as-is the tool description lies to the model.

9. Numeric epoch-ms timestamps widen the shared `Manifest` type

Every other manifest kind uses ISO strings (attachment createdAt, futureSend sentAt/failedAt); the goal entry alone uses number, forcing createdAt?: string | number on the flattened Manifest type (entity-schema.ts:964). Use ISO strings and the widening disappears.

Suggestions

Speculative statuses: paused/blocked exist in the schema enum, GoalStatus, and GoalBanner CSS, but no code path sets either (paused isn't even handled in statusClass). Trim to the three real states.
Token formatter ×4: identical hand-rolled formatTokenCount in goal-command.ts:162, horton.ts:859, GoalBanner.tsx:103, plus a divergent Intl.NumberFormat version in TokenUsage.tsx:54. Concretely: 12,500 tokens renders as 13k in GoalBanner and 12.5k in TokenUsage, in the same UI. Share one helper.
mark_goal_complete is registered unconditionally (horton.ts:381) while the sibling optional tools use conditional spread (docsSearchTool). Tools are rebuilt per wake and ctx.getGoal() is available there, so gating on an active goal is easy — with no goal, the model currently sees a tool that can only return "No active goal to mark complete."
No re-kickoff on abandonment: if the agent stops without calling mark_goal_complete (model just stops, or budget trip), the goal stalls silently — no further autonomous progress until the user re-issues a command. May be intentional for v1, but worth stating in the PR description given the "work autonomously toward it" framing.
Test gap: the 21 tests cover the parser/dispatcher (with mocked goal API), but createGoalApi itself has zero coverage (setGoal same-objective carryover, never-decrease, status flips), and nothing exercises the horton enforcement wiring — which is exactly where Working with Postgres WAL format from Elixir #1 lives.

Housekeeping

Base is already main, but the diff still contains #4502's token-usage changes plus .changeset/agent-token-usage.md (which patches @electric-ax/agents-desktop — no desktop files in this diff). Needs a rebase once #4502 lands or the changeset/code double-land.

The two to prioritize: #1 (small fix, big blast radius) and #3 (the default budget semantics make the headline feature hard to use as shipped). Everything else is mechanical. Line references are against the branch head at review time, so they may drift with new pushes.

Review iteration: 1 | 2026-06-10 | 🤖 Generated with Claude Code, requested by @msfstef

kevin-dp · 2026-06-10T15:13:34Z

Review follow-up — all findings addressed

One commit per finding, on top of the rebased branch:

#	Finding	Commit	Resolution
1	Enforcement gated on goal existence, not `status === 'active'`	`b92dafd48`	Single `enforcedGoal` drives prompt info, the `onStepEnd` hook, and the post-run usage write. New `horton-goal-enforcement.test.ts` covers all four states (active / budget_limited / complete / none).
2	`replyText` run-key collision with the bridge	`9cba9585c`	New `allocateRunKey` consults and advances the bridge's shared id-seed cache (keyed by runs collection id) with a caller-local monotonicity floor.
3	Budget burns at cache-read weight	`56bc8e866`	Step-end hook now carries `uncachedInput`; the budget accumulates `uncachedInput + output`. Display semantics unchanged. 50k default kept — it's plausible under the new accounting.
4	Two unreconciled write paths	`8f2fe27cf`	`refreshGoalUsage`, `tokensAtCreation`, and `sumStepTokens` removed entirely — the handler's in-memory accumulator via `updateGoalUsage` is the single usage writer.
5	Dead `markGoalBudgetLimited`	`dc2e12f20`	Dropped from API, HandlerContext, wiring, and test stubs.
6	`/goal` bypasses the slash-command registry	`25633d2d4`	`GOAL_SLASH_COMMAND` defined next to the parser and registered as a static command on the horton type.
7	UI re-implements the parser	`183c64c61`	Parser exported from `/client`; `MessageInput.isAbortingGoalCommand` delegates to it.
8	`mark_goal_complete` summary discarded	`db74c8f2c`	`summary` persisted on the goal entry, echoed in the tool result, surfaced by `/goal show`. Also preserved across subsequent usage writes (`89c344c50`).
9	Numeric timestamps widen `Manifest`	`89c344c50`	ISO strings; the `string \| number` widening is gone.
—	Speculative `paused`/`blocked` statuses	`0c496a7f8`	Trimmed to active / complete / budget_limited; `statusClass` is now exhaustive.
—	Token formatter ×4	`ede662fdd`	One Intl-compact helper in `token-budget.ts` (exported from index and `/client`), used by goal-command, horton, GoalBanner, and TokenUsage.
—	`mark_goal_complete` registered unconditionally	`6c3447970`	Gated on `ctx.getGoal()?.status === 'active'`; gating tests assert presence/absence.
—	`createGoalApi` test gap	`642777819`	14 new unit tests: budget defaults, same-objective carryover, never-decrease, status flips, live writeEvent path, summary handling.
—	No re-kickoff on abandonment	PR description	Documented as an intentional v1 limitation.

Housekeeping note (base still containing #4502's commits) is inherent to the stacked-PR setup; GitHub will retarget when #4502 merges.

🤖 Generated with Claude Code

Lets the user set a session-scoped objective with an optional token cap. Horton works autonomously toward the goal and stops when it calls `mark_goal_complete` or when the run exceeds the budget. Cap is enforced mid-run via an `onStepEnd` hook on the outbound bridge so the abort fires within a step, not after the agent decides to stop. - `/goal set "..." [--tokens N|unlimited]` (default 50k) - `/goal show | clear | complete` - One goal per session, persisted as a `kind: 'goal'` manifest entry — resumes across desktop restarts via the existing Electric sync. - New `ctx.replyText` helper synthesizes a complete runs+texts sequence so slash-command responses and the budget-limit notice render as ordinary agent replies. - `AgentHandle.run` gains an optional `abortSignal`, combined with the runtime's `runSignal` so a budget abort can co-exist with SIGINT-style user aborts. - State-changing `/goal` commands typed mid-run also signal SIGINT so the prior run interrupts instead of finishing old work first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The assistantHandler now reads the active goal up front and wires budget enforcement via ctx.updateGoalUsage, so the tool-composition test stub needs the goal-related methods present. Stubs to a no-goal state so the captured agent config reflects the tool path only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Same root cause as the previous commit — the assistantHandler now calls ctx.getGoal at the top, so any test that exercises the handler through a stubbed context needs those methods present. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Review #1 (critical): enforcement was gated on goal *existence*, so a budget_limited goal kept tripping on every subsequent chat turn (abort + stop message until /goal clear), and a complete goal kept accumulating tokensUsed from unrelated runs — eventually flipping back to budget_limited. Derive one `enforcedGoal` (status === 'active') and use it for the prompt info, the onStepEnd hook, and the post-run usage write. Adds gating regression tests for all four states. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… seed Review #2 (critical): ctx.replyText/ctx.recordRun seeded their run-N counter from runs.toArray, but writeEvent has no synchronous local apply — events only land in the collection after a round-trip. A synthetic reply written right after agent.run (e.g. the budget-stop notice) could reuse a run-N key the bridge had just allocated. New allocateRunKey consults and advances the bridge's shared id-seed cache (keyed by the runs collection id) with a caller-local floor for monotonicity when the collection lags. Test harness now uses unique collection ids per case, matching production where every entity has its own runs collection. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Review #3: the persisted input_tokens (correctly, for display) sums input + cacheRead + cacheWrite, but budgeting on that re-counts the entire conversation on every warm-cache step — a session with ~20k of context would exhaust the 50k default budget in 2-3 steps regardless of new work. The step-end hook now also carries `uncachedInput` (raw `usage.input`, falling back to the flat legacy counter for providers without cache columns), and Horton accumulates `uncachedInput + output` toward the budget. Display semantics are unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Review #4: per-step updates went through writeEvent (live) while the end-of-wake refreshGoalUsage recomputed from possibly-stale collections and wrote via the staged wake-session transaction — which commits last and could clobber the live value. Horton's in-memory accumulator is the authority, so drop the fallback recompute entirely: refreshGoalUsage, the tokensAtCreation baseline it depended on, and sumStepTokens are all removed. updateGoalUsage (never-decrease, writeEvent-direct) is now the only usage writer. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Review #5: defined, exposed on HandlerContext, stubbed in tests — never called. Horton flips status via updateGoalUsage(..., { status: 'budget_limited' }). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Review #6: /goal worked but was invisible — no composer autocomplete, and serializeComposerInput flagged it unknown. Define GOAL_SLASH_COMMAND next to the parser and register it alongside the skill commands on the horton entity type. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Review #7: MessageInput hand-rolled the /goal prefix + subcommand parsing (hardcoding the `done` alias) because goal-command.ts wasn't exported from the /client entry. The parser is pure — export it and delegate, so UI abort behavior and runtime dispatch share one grammar. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Review #8: the tool's description promised the summary was recorded but execute() discarded it, and the system prompt told the model to use a summary when blocked. Add an optional summary field to the goal entry, thread it through markGoalComplete(summary?), echo it in the tool result, and surface it from /goal show. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Review #9: the goal kind alone used epoch-ms numbers, forcing `createdAt?: string | number` onto the flattened Manifest type while every other manifest kind uses ISO strings. Switch to ISO strings — the widening disappears. Also carries the completion summary through updateGoalUsage's rebuilt entry, which previously dropped it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Review suggestion: paused/blocked existed in the enum and CSS but no code path ever set them (and statusClass didn't even handle paused). Three real states remain: active, complete, budget_limited — which also makes statusClass exhaustive without a default arm. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Review suggestion: three hand-rolled formatTokenCount copies (goal-command, horton, GoalBanner) diverged from TokenUsage's Intl-compact version — 12,500 rendered as "13k" in the banner and "12.5k" in the meta row of the same UI. One Intl-compact-based helper now lives in token-budget.ts (exported from index and /client) and all four call sites use it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Review suggestion: the tool was registered unconditionally, so with no goal the model saw a tool whose only possible answer was "No active goal to mark complete." Tools are rebuilt per wake, so gate on ctx.getGoal()?.status === 'active' like the other conditional tools. Gating tests now also assert tool presence/absence. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Review test-gap note: the parser/dispatcher had coverage but the goal API itself had none. Covers setGoal default/unlimited budgets, same-objective carryover vs new-objective reset, updateGoalUsage never-decrease + status flip + no-op writes + the writeEvent live path + summary preservation, markGoalComplete summary trimming, and clear/get round-trips. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Rebase integration: main added observe-pg-sync-tool.test.ts, which calls createHortonTools with a minimal ctx stub. Tool composition is now goal-aware (mark_goal_complete only registers for an active goal), so the stub needs getGoal present. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Rebase integration bug: the native composer (#4533) sends a WireComposerInputPayload — raw text in `source` plus parsed `nodes` — instead of `{ text }`. extractWakeText only read `text`, so /goal messages fell through to the LLM, whose skills guidance made it reply "I don't have a /goal skill in my catalog". Read `source` as the fallback; regression tests cover both payload shapes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

A goal mutation firing mid-run (e.g. the mark_goal_complete tool) read the local manifests collection — which lags live writeEvent writes by a stream round-trip — and persisted through the wake-session's staged transaction, which replays at end-of-wake. The stale snapshot landed after, and overwrote, the fresher per-step tokensUsed written live (observed: bar showed 3.3k after the counter had reached 5.7k). Route every goal mutation through a single ordered channel (direct writeEvent upserts when wired; the staged path remains only as a test fallback) and add an in-wake read-your-writes cache so same-wake reads always observe the latest write. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

On cache-enabled providers the newly appended prompt tokens of a warm turn are reported as cacheWrite, with usage.input collapsing to ~0 — so the budget's "uncached input" side was effectively zero and the cap tracked output only. Include cacheWrite in the uncached-input figure the onStepEnd hook reports; cache reads stay excluded. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

kevin-dp force-pushed the kevin/agent-token-usage branch from 36ccc20 to 2802169 Compare June 10, 2026 11:40

kevin-dp force-pushed the kevin/agent-goal-tracking branch from a046b12 to 6ae1334 Compare June 10, 2026 11:48

Base automatically changed from kevin/agent-token-usage to main June 10, 2026 12:33

kevin-dp and others added 19 commits June 11, 2026 10:16

chore: bump changeset to patch for agent-goal-tracking

44f855b

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docs(agents-runtime): explain ctx.replyText's five-write sequence

ffe88e3

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

refactor(agents-runtime): drop dead markGoalBudgetLimited API

7da7102

Review #5: defined, exposed on HandlerContext, stubbed in tests — never called. Horton flips status via updateGoalUsage(..., { status: 'budget_limited' }). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

kevin-dp force-pushed the kevin/agent-goal-tracking branch from 6427778 to 655cc46 Compare June 11, 2026 08:22

kevin-dp and others added 2 commits June 11, 2026 11:53

kevin-dp mentioned this pull request Jun 11, 2026

fix(agents-server-ui): show uncached input tokens in the meta row #4567

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agents): add /goal slash command with token-budget enforcement#4552

feat(agents): add /goal slash command with token-budget enforcement#4552
kevin-dp wants to merge 22 commits into
mainfrom
kevin/agent-goal-tracking

kevin-dp commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

kevin-dp commented Jun 10, 2026

Uh oh!

msfstef commented Jun 10, 2026

Uh oh!

kevin-dp commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kevin-dp commented Jun 10, 2026

Summary

Behaviour

Plumbing

Stacking

Test plan

Followups (deferred)

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Electric Agents Desktop Builds

Uh oh!

codecov Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Electric Agents Mobile Build

Uh oh!

kevin-dp commented Jun 10, 2026

CI failures investigation

Uh oh!

msfstef commented Jun 10, 2026

Claude Code Review

Summary

Critical

1. Budget enforcement is gated on goal existence, not status === 'active'

2. replyText can collide with the outbound bridge on run-N keys

Important

3. Budget burns at cache-read weight — the 50k default is likely unusable

4. Two unreconciled write paths for the goal manifest key

5. Dead API: markGoalBudgetLimited

6. /goal bypasses the existing slash-command registry

7. UI re-implements the command parser

8. mark_goal_complete's summary is documented as recorded but discarded

9. Numeric epoch-ms timestamps widen the shared Manifest type

Suggestions

Housekeeping

Uh oh!

kevin-dp commented Jun 10, 2026

Review follow-up — all findings addressed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 10, 2026 •

edited

Loading

codecov Bot commented Jun 10, 2026 •

edited

Loading

github-actions Bot commented Jun 10, 2026 •

edited

Loading

1. Budget enforcement is gated on goal existence, not `status === 'active'`

2. `replyText` can collide with the outbound bridge on `run-N` keys

5. Dead API: `markGoalBudgetLimited`

6. `/goal` bypasses the existing slash-command registry

8. `mark_goal_complete`'s `summary` is documented as recorded but discarded

9. Numeric epoch-ms timestamps widen the shared `Manifest` type