Skip to content

feat(agents): add /goal slash command with token-budget enforcement#4552

Open
kevin-dp wants to merge 22 commits into
mainfrom
kevin/agent-goal-tracking
Open

feat(agents): add /goal slash command with token-budget enforcement#4552
kevin-dp wants to merge 22 commits into
mainfrom
kevin/agent-goal-tracking

Conversation

@kevin-dp

Copy link
Copy Markdown
Contributor

Summary

Adds a /goal slash command to Horton sessions. The user sets an objective with an optional token cap; the agent works autonomously toward it and stops when mark_goal_complete is called or when the run exceeds the budget. Budget enforcement is mid-run (via an onStepEnd hook on the outbound bridge), not at run boundaries — so a goal with a small budget actually halts the agent within a step, rather than letting it finish whatever giant tool-loop is in flight.

/goal set "ship feature X" --tokens 50k   # default 50k tokens
/goal set "explore" --unlimited           # opt out of the cap
/goal show                                 # current state
/goal complete                             # mark done manually
/goal clear                                # remove the goal

Behaviour

  • One goal per session, persisted as a kind: 'goal' entry on the manifests collection — resumes across desktop restarts via the existing Electric sync, no schema migration.
  • Mid-run token enforcement: an onStepEnd hook on the outbound bridge surfaces per-step input/output tokens; Horton accumulates them and aborts ctx.agent.run() via an AbortController once tokensUsed >= tokenBudget. The cap covers the sum of input + output across all steps since the goal was set.
  • Live progress: the goal banner ticks up after each step. The manifest update is written via writeEvent directly (not the wake-session's staged manifest transaction, which only commits at end-of-wake — too late for a long-running run).
  • mark_goal_complete tool: registered on Horton's tool list; flips status to complete. The chat reply renders via the new ctx.replyText helper, which synthesizes a complete runs + texts + textDeltas sequence.
  • State-changing /goal commands interrupt the active run/goal complete, /goal clear, and /goal set typed while a run is in flight signal SIGINT alongside sending the message, so the prior run aborts instead of finishing its old work first. /goal show is read-only and never interrupts.
  • Budget-limited stop message: when the cap is hit mid-run, the agent posts a synthetic reply explaining what happened (with a suggested larger budget) instead of just falling silent.

Plumbing

  • entity-schema.ts — new ManifestGoalEntryValue (objective, status, tokenBudget, tokensUsed, tokensAtCreation, createdAt, updatedAt) added to the manifest discriminated union.
  • goal-api.ts (new) — setGoal / clearGoal / getGoal / markGoalComplete / markGoalBudgetLimited / updateGoalUsage / refreshGoalUsage. updateGoalUsage writes the manifest update directly through writeEvent for live UI; refreshGoalUsage never decreases tokensUsed so a stale collection sum can't clobber an authoritative in-memory value.
  • goal-command.ts (new) — /goal parser (--tokens N|50k|1.2m|unlimited, --unlimited flag, subcommand aliases done/status) and dispatcher.
  • tools/goal-tools.ts (new) — createMarkGoalCompleteTool exposes the completion signal to the LLM.
  • outbound-bridge.ts — new optional OutboundBridgeHooks.onStepEnd callback, threaded through pi-adapter and the AgentConfig passed to useAgent.
  • context-factory.tsAgentHandle.run now accepts an optional abortSignal and combines it with the runtime's runSignal. New ctx.replyText(text) writes a complete runs+texts+textDeltas sequence so synthetic replies render in the chat. New goal-related methods exposed on HandlerContext.
  • horton.tstryHandleSlashCommand intercepts /goal * before the LLM; /goal set enqueues a one-shot kickoff so the agent starts immediately; assistantHandler wires the budget-enforcing onStepEnd, aborts on overflow, and posts the explanation reply.
  • agents-server-ui — new GoalBanner component above the timeline (objective + budget bar + status badge). MessageInput aborts the active run when a state-changing /goal command is submitted. EntityTimeline / EntityContextDrawer handle the new goal manifest kind.

Stacking

Branched off kevin/agent-token-usage (#4502) since this depends on the persisted per-step input_tokens / output_tokens columns from that PR. Base will retarget to main when #4502 merges.

Test plan

  • npx tsc --noEmit clean in agents-runtime, agents, and agents-server-ui
  • 21/21 unit tests pass in packages/agents-runtime/test/goal-command.test.ts (parser + dispatcher, including --tokens formats, unlimited, error paths, and all subcommands)
  • Manual: /goal set "..." --tokens 100k — banner appears, agent kicks off, tokens tick up live during the run, goal completes when model calls mark_goal_complete
  • Manual: /goal set "..." --tokens 5k on a large task — agent halts mid-run with budget_limited status and the explanation reply
  • Manual: /goal complete typed while a run is active — prior run aborts via SIGINT, goal flips to complete
  • Manual: /goal set "B" typed while goal A is running — prior run aborts, goal A replaced, agent kicks off on B
  • Manual: /goal clear removes the banner
  • Manual: /goal show reports current state (no abort)
  • Manual: persistence across desktop restart — goal entry resumes correctly
  • Manual: a no-goal regression check (normal chat behaves unchanged)

Followups (deferred)

  • The /goal set kickoff message (Start working toward the active goal now...) is currently visible in the chat as a self-sent inbox message. Could be filtered from the LLM's view or styled differently.
  • Dollar-cost budgets (instead of raw tokens) would be more intuitive but require provider-specific pricing tables.
  • Goose-style "verify the goal is met before stopping" nudge is not implemented; the model just calls mark_goal_complete when it decides it's done.

🤖 Generated with Claude Code

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Electric Agents Desktop Builds

Build artifacts for commit 0c9d86c.

Platform Status Artifact
macOS Apple Silicon Passed DMG
macOS Intel Passed DMG
Windows x64 Passed Installer
Linux x64 Passed AppImage / deb

Workflow run

@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 72.15909% with 196 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.31%. Comparing base (b8875a2) to head (0c9d86c).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
packages/agents-runtime/src/context-factory.ts 30.68% 61 Missing ⚠️
packages/agents-runtime/src/tools/goal-tools.ts 4.34% 44 Missing ⚠️
...ges/agents-server-ui/src/components/GoalBanner.tsx 0.00% 39 Missing ⚠️
packages/agents/src/agents/horton.ts 77.94% 15 Missing ⚠️
...agents-server-ui/src/components/EntityTimeline.tsx 0.00% 12 Missing ⚠️
packages/agents-runtime/src/goal-command.ts 96.69% 7 Missing ⚠️
...s/agents-server-ui/src/components/MessageInput.tsx 0.00% 6 Missing ⚠️
packages/agents-runtime/src/outbound-bridge.ts 86.20% 4 Missing ⚠️
...s-server-ui/src/components/EntityContextDrawer.tsx 0.00% 4 Missing ⚠️
packages/agents-runtime/src/goal-api.ts 98.22% 3 Missing ⚠️
... and 1 more
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #4552       +/-   ##
===========================================
- Coverage   72.77%   58.31%   -14.47%     
===========================================
  Files          86      373      +287     
  Lines        9779    41150    +31371     
  Branches     2982    11681     +8699     
===========================================
+ Hits         7117    23997    +16880     
- Misses       2608    17078    +14470     
- Partials       54       75       +21     
Flag Coverage Δ
packages/agents 71.92% <77.94%> (?)
packages/agents-mcp 77.54% <ø> (?)
packages/agents-mobile 75.49% <ø> (?)
packages/agents-runtime 82.01% <79.13%> (?)
packages/agents-server 74.81% <ø> (-0.03%) ⬇️
packages/agents-server-ui 6.22% <0.00%> (?)
packages/electric-ax 46.42% <ø> (ø)
packages/experimental 87.73% <ø> (ø)
packages/react-hooks 86.48% <ø> (ø)
packages/start 82.83% <ø> (?)
packages/typescript-client 91.71% <ø> (ø)
packages/y-electric 56.05% <ø> (ø)
typescript 58.31% <72.15%> (-14.47%) ⬇️
unit-tests 58.31% <72.15%> (-14.47%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Electric Agents Mobile Build

Local mobile checks ran for commit 0c9d86c.

The EAS Android preview build was skipped because the mobile-eas-build label is not present.
Add the mobile-eas-build label to this PR to produce an installable preview build.

Workflow run

@kevin-dp

Copy link
Copy Markdown
Contributor Author

CI failures investigation

Three test jobs went red on the first push. After investigation:

Fixed in this PR (commits fc052121a, a046b12be):

  • Test packages/agents — two existing tests (horton-tool-composition.test.ts, horton-model-selection.test.ts) build a fake ctx and pass it to assistantHandler. The handler now calls ctx.getGoal() at the top, so the stubs needed those goal-related methods present. Fixed by adding getGoal: vi.fn(() => undefined) (and matching updateGoalUsage / markGoalBudgetLimited no-ops) to the test stubs. ✅ Now passing.

Not introduced by this PR (pre-existing on the base branch):

  • Test packages/agents-runtime — 92/92 spawn failures with Principal is not allowed to spawn X errors against runtime-dsl.test.ts.
  • Test packages/agents-server — 15 failures including this.registry.replaceSharedStateLink is not a function, ctx.entityManager.registry.hasEntityPermission is not a function, and various 500s in dispatch-policy-routing.test.ts.

These reproduce on the base branch (kevin/agent-token-usage) locally — same exact 15 failures in agents-server. The base's CI run on 2026-06-08 happened to pass, but the failures appear to be test-infra flake (the failing tests use mocked registries that are missing methods the code now calls, or rely on a clean test backend that may have drifted).

My diff to packages/agents-server/ is empty (git diff origin/kevin/agent-token-usage...HEAD -- packages/agents-server/ returns nothing), so this PR can't be the cause of those failures. They likely need a separate fix at the test-infra or test-stubs level on the base branch / main.

@kevin-dp kevin-dp force-pushed the kevin/agent-token-usage branch from 36ccc20 to 2802169 Compare June 10, 2026 11:40
@kevin-dp kevin-dp force-pushed the kevin/agent-goal-tracking branch from a046b12 to 6ae1334 Compare June 10, 2026 11:48
Base automatically changed from kevin/agent-token-usage to main June 10, 2026 12:33
@msfstef

msfstef commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Claude Code Review

Summary

Reviewed the /goal slash command + token-budget enforcement end to end (parser, goal API, enforcement wiring, UI). Solid feature overall: the parser/dispatcher split is clean and well-tested, the never-decrease guards on usage writes are thoughtful, and mid-run abort via a combined AbortSignal is the right mechanism. There's one logic bug that looks blocking, one probable race, and a handful of smaller items. All findings below were adversarially re-verified against the branch before posting.

Critical

1. Budget enforcement is gated on goal existence, not status === 'active'

horton.ts:768–798activeGoalPromptInfo correctly checks activeGoal.status === 'active', but the enforcement hook is gated on mere truthiness: const onStepEnd = activeGoal ? ... : undefined, with runTokensUsed seeded from activeGoal.tokensUsed. The only other status guard in the flow is on kickoffGoalRun (horton.ts:554). Consequences:

  • After a goal hits budget_limited, it stays in the manifest with tokensUsed >= tokenBudget. Every subsequent normal chat turn wires onStepEnd, starts already over budget, trips on the first step, aborts the run, and re-posts the "⚠️ Stopped" reply — chat is effectively dead until /goal clear.
  • A goal marked complete keeps accumulating tokensUsed from unrelated runs (updateGoalUsage at horton.ts:789, 836 has no status guard either — the banner keeps ticking on a "complete" goal), and once it crosses the budget, the post-run updateGoalUsage(..., { status: 'budget_limited' }) flips a completed goal back to budget_limited and starts aborting runs.

Fix is small: derive one enforcedGoal = activeGoal?.status === 'active' ? activeGoal : undefined and use it for both prompt info and enforcement. A "goal complete → next run unaffected" test would have caught this (see test note below).

2. replyText can collide with the outbound bridge on run-N keys

context-factory.ts:482–496replyText allocates keys via nextRunKey(), which lazily seeds recordRunCounter from db.collections.runs.toArray. But writeEvent is producer.append (process-wake.ts:480) with no synchronous local apply — collections only update when events round-trip via SSE (process-wake.ts:1043); the comment at horton.ts:833–835 notes this for steps. The budget-stop writeSlashCommandReply (horton.ts:844) fires immediately after the aborted ctx.agent.run() with no drain in between, so the run the bridge just created may not be in the collection yet and replyText can emit a runs.insert reusing the same run-N key. Timing-dependent, but real. The bridge already solves this exact problem with outboundIdSeednextRunKey should consult the same seed (or the two should share a counter).

Important

3. Budget burns at cache-read weight — the 50k default is likely unusable

The pi-adapter change (correct for display) sums input + cacheRead + cacheWrite into input_tokens, and the budget accumulates that per step. On warm turns cacheRead includes the full prompt, so every step re-counts the entire conversation — a session with ~20k of context exhausts DEFAULT_TOKEN_BUDGET = 50_000 in 2–3 steps. Display and enforcement share one number but want different things; consider counting only uncached input + output toward the budget (closer to actual cost), or a much larger default.

4. Two unreconciled write paths for the goal manifest key

Per-step updateGoalUsage writes via raw writeEvent (goal-api.ts:208–216), while end-of-wake refreshGoalUsage (process-wake.ts:2156) reads possibly-stale collections and writes via the staged wake-session transaction — which commits last. The never-decrease guards defend tokensUsed, but nothing reconciles the two channels for the other fields. Same round-trip-lag fragility class as #1/#2; worth at least a comment explaining the ordering, or funneling both through one path.

5. Dead API: markGoalBudgetLimited

Defined (goal-api.ts:164), exposed on HandlerContext (types.ts:1056, context-factory.ts:994), stubbed in two tests — never called. Horton uses updateGoalUsage(..., { status: 'budget_limited' }) instead. Use it or drop it.

6. /goal bypasses the existing slash-command registry

The machinery already exists: SlashCommandDefinition (composer-input.ts:69), published via slash_commands (create-handler.ts:506), and horton already registers skill commands this way (horton.ts:950). /goal isn't registered, so no autocomplete/discovery, and serializeComposerInput flags it unknown (text still arrives intact, so it works — but it's invisible to users).

7. UI re-implements the command parser

MessageInput.isAbortingGoalCommand hand-rolls the /goal prefix check + subcommand parsing (hardcoding the done alias). Root cause: goal-command.ts is only exported from the main index.ts, not the /client entry the UI imports from. The parser is pure — export it from client.ts and reuse it, otherwise runtime dispatch and UI abort behavior drift the next time an alias is added.

8. mark_goal_complete's summary is documented as recorded but discarded

goal-tools.tsexecute ignores _params entirely, markGoalComplete() only flips status, and ManifestGoalEntryValue has no summary field. The system prompt also tells the model to use a summary when blocked. Either persist it or fix both descriptions — as-is the tool description lies to the model.

9. Numeric epoch-ms timestamps widen the shared Manifest type

Every other manifest kind uses ISO strings (attachment createdAt, futureSend sentAt/failedAt); the goal entry alone uses number, forcing createdAt?: string | number on the flattened Manifest type (entity-schema.ts:964). Use ISO strings and the widening disappears.

Suggestions

  • Speculative statuses: paused/blocked exist in the schema enum, GoalStatus, and GoalBanner CSS, but no code path sets either (paused isn't even handled in statusClass). Trim to the three real states.
  • Token formatter ×4: identical hand-rolled formatTokenCount in goal-command.ts:162, horton.ts:859, GoalBanner.tsx:103, plus a divergent Intl.NumberFormat version in TokenUsage.tsx:54. Concretely: 12,500 tokens renders as 13k in GoalBanner and 12.5k in TokenUsage, in the same UI. Share one helper.
  • mark_goal_complete is registered unconditionally (horton.ts:381) while the sibling optional tools use conditional spread (docsSearchTool). Tools are rebuilt per wake and ctx.getGoal() is available there, so gating on an active goal is easy — with no goal, the model currently sees a tool that can only return "No active goal to mark complete."
  • No re-kickoff on abandonment: if the agent stops without calling mark_goal_complete (model just stops, or budget trip), the goal stalls silently — no further autonomous progress until the user re-issues a command. May be intentional for v1, but worth stating in the PR description given the "work autonomously toward it" framing.
  • Test gap: the 21 tests cover the parser/dispatcher (with mocked goal API), but createGoalApi itself has zero coverage (setGoal same-objective carryover, never-decrease, status flips), and nothing exercises the horton enforcement wiring — which is exactly where Working with Postgres WAL format from Elixir #1 lives.

Housekeeping

Base is already main, but the diff still contains #4502's token-usage changes plus .changeset/agent-token-usage.md (which patches @electric-ax/agents-desktop — no desktop files in this diff). Needs a rebase once #4502 lands or the changeset/code double-land.


The two to prioritize: #1 (small fix, big blast radius) and #3 (the default budget semantics make the headline feature hard to use as shipped). Everything else is mechanical. Line references are against the branch head at review time, so they may drift with new pushes.


Review iteration: 1 | 2026-06-10 | 🤖 Generated with Claude Code, requested by @msfstef

@kevin-dp

Copy link
Copy Markdown
Contributor Author

Review follow-up — all findings addressed

One commit per finding, on top of the rebased branch:

# Finding Commit Resolution
1 Enforcement gated on goal existence, not status === 'active' b92dafd48 Single enforcedGoal drives prompt info, the onStepEnd hook, and the post-run usage write. New horton-goal-enforcement.test.ts covers all four states (active / budget_limited / complete / none).
2 replyText run-key collision with the bridge 9cba9585c New allocateRunKey consults and advances the bridge's shared id-seed cache (keyed by runs collection id) with a caller-local monotonicity floor.
3 Budget burns at cache-read weight 56bc8e866 Step-end hook now carries uncachedInput; the budget accumulates uncachedInput + output. Display semantics unchanged. 50k default kept — it's plausible under the new accounting.
4 Two unreconciled write paths 8f2fe27cf refreshGoalUsage, tokensAtCreation, and sumStepTokens removed entirely — the handler's in-memory accumulator via updateGoalUsage is the single usage writer.
5 Dead markGoalBudgetLimited dc2e12f20 Dropped from API, HandlerContext, wiring, and test stubs.
6 /goal bypasses the slash-command registry 25633d2d4 GOAL_SLASH_COMMAND defined next to the parser and registered as a static command on the horton type.
7 UI re-implements the parser 183c64c61 Parser exported from /client; MessageInput.isAbortingGoalCommand delegates to it.
8 mark_goal_complete summary discarded db74c8f2c summary persisted on the goal entry, echoed in the tool result, surfaced by /goal show. Also preserved across subsequent usage writes (89c344c50).
9 Numeric timestamps widen Manifest 89c344c50 ISO strings; the string | number widening is gone.
Speculative paused/blocked statuses 0c496a7f8 Trimmed to active / complete / budget_limited; statusClass is now exhaustive.
Token formatter ×4 ede662fdd One Intl-compact helper in token-budget.ts (exported from index and /client), used by goal-command, horton, GoalBanner, and TokenUsage.
mark_goal_complete registered unconditionally 6c3447970 Gated on ctx.getGoal()?.status === 'active'; gating tests assert presence/absence.
createGoalApi test gap 642777819 14 new unit tests: budget defaults, same-objective carryover, never-decrease, status flips, live writeEvent path, summary handling.
No re-kickoff on abandonment PR description Documented as an intentional v1 limitation.

Housekeeping note (base still containing #4502's commits) is inherent to the stacked-PR setup; GitHub will retarget when #4502 merges.

🤖 Generated with Claude Code

kevin-dp and others added 19 commits June 11, 2026 10:16
Lets the user set a session-scoped objective with an optional token
cap. Horton works autonomously toward the goal and stops when it calls
`mark_goal_complete` or when the run exceeds the budget. Cap is
enforced mid-run via an `onStepEnd` hook on the outbound bridge so the
abort fires within a step, not after the agent decides to stop.

- `/goal set "..." [--tokens N|unlimited]` (default 50k)
- `/goal show | clear | complete`
- One goal per session, persisted as a `kind: 'goal'` manifest entry —
  resumes across desktop restarts via the existing Electric sync.
- New `ctx.replyText` helper synthesizes a complete runs+texts
  sequence so slash-command responses and the budget-limit notice
  render as ordinary agent replies.
- `AgentHandle.run` gains an optional `abortSignal`, combined with
  the runtime's `runSignal` so a budget abort can co-exist with
  SIGINT-style user aborts.
- State-changing `/goal` commands typed mid-run also signal SIGINT so
  the prior run interrupts instead of finishing old work first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The assistantHandler now reads the active goal up front and wires
budget enforcement via ctx.updateGoalUsage, so the tool-composition
test stub needs the goal-related methods present. Stubs to a no-goal
state so the captured agent config reflects the tool path only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same root cause as the previous commit — the assistantHandler now
calls ctx.getGoal at the top, so any test that exercises the handler
through a stubbed context needs those methods present.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review #1 (critical): enforcement was gated on goal *existence*, so a
budget_limited goal kept tripping on every subsequent chat turn (abort
+ stop message until /goal clear), and a complete goal kept
accumulating tokensUsed from unrelated runs — eventually flipping back
to budget_limited. Derive one `enforcedGoal` (status === 'active') and
use it for the prompt info, the onStepEnd hook, and the post-run usage
write. Adds gating regression tests for all four states.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… seed

Review #2 (critical): ctx.replyText/ctx.recordRun seeded their run-N
counter from runs.toArray, but writeEvent has no synchronous local
apply — events only land in the collection after a round-trip. A
synthetic reply written right after agent.run (e.g. the budget-stop
notice) could reuse a run-N key the bridge had just allocated. New
allocateRunKey consults and advances the bridge's shared id-seed cache
(keyed by the runs collection id) with a caller-local floor for
monotonicity when the collection lags. Test harness now uses unique
collection ids per case, matching production where every entity has
its own runs collection.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Review #3: the persisted input_tokens (correctly, for display) sums
input + cacheRead + cacheWrite, but budgeting on that re-counts the
entire conversation on every warm-cache step — a session with ~20k of
context would exhaust the 50k default budget in 2-3 steps regardless
of new work. The step-end hook now also carries `uncachedInput` (raw
`usage.input`, falling back to the flat legacy counter for providers
without cache columns), and Horton accumulates `uncachedInput +
output` toward the budget. Display semantics are unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Review #4: per-step updates went through writeEvent (live) while the
end-of-wake refreshGoalUsage recomputed from possibly-stale
collections and wrote via the staged wake-session transaction — which
commits last and could clobber the live value. Horton's in-memory
accumulator is the authority, so drop the fallback recompute entirely:
refreshGoalUsage, the tokensAtCreation baseline it depended on, and
sumStepTokens are all removed. updateGoalUsage (never-decrease,
writeEvent-direct) is now the only usage writer.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Review #5: defined, exposed on HandlerContext, stubbed in tests —
never called. Horton flips status via
updateGoalUsage(..., { status: 'budget_limited' }).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Review #6: /goal worked but was invisible — no composer autocomplete,
and serializeComposerInput flagged it unknown. Define
GOAL_SLASH_COMMAND next to the parser and register it alongside the
skill commands on the horton entity type.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Review #7: MessageInput hand-rolled the /goal prefix + subcommand
parsing (hardcoding the `done` alias) because goal-command.ts wasn't
exported from the /client entry. The parser is pure — export it and
delegate, so UI abort behavior and runtime dispatch share one grammar.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Review #8: the tool's description promised the summary was recorded
but execute() discarded it, and the system prompt told the model to
use a summary when blocked. Add an optional summary field to the goal
entry, thread it through markGoalComplete(summary?), echo it in the
tool result, and surface it from /goal show.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Review #9: the goal kind alone used epoch-ms numbers, forcing
`createdAt?: string | number` onto the flattened Manifest type while
every other manifest kind uses ISO strings. Switch to ISO strings —
the widening disappears. Also carries the completion summary through
updateGoalUsage's rebuilt entry, which previously dropped it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Review suggestion: paused/blocked existed in the enum and CSS but no
code path ever set them (and statusClass didn't even handle paused).
Three real states remain: active, complete, budget_limited — which
also makes statusClass exhaustive without a default arm.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Review suggestion: three hand-rolled formatTokenCount copies
(goal-command, horton, GoalBanner) diverged from TokenUsage's
Intl-compact version — 12,500 rendered as "13k" in the banner and
"12.5k" in the meta row of the same UI. One Intl-compact-based helper
now lives in token-budget.ts (exported from index and /client) and
all four call sites use it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Review suggestion: the tool was registered unconditionally, so with no
goal the model saw a tool whose only possible answer was "No active
goal to mark complete." Tools are rebuilt per wake, so gate on
ctx.getGoal()?.status === 'active' like the other conditional tools.
Gating tests now also assert tool presence/absence.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Review test-gap note: the parser/dispatcher had coverage but the goal
API itself had none. Covers setGoal default/unlimited budgets,
same-objective carryover vs new-objective reset, updateGoalUsage
never-decrease + status flip + no-op writes + the writeEvent live
path + summary preservation, markGoalComplete summary trimming, and
clear/get round-trips.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Rebase integration: main added observe-pg-sync-tool.test.ts, which
calls createHortonTools with a minimal ctx stub. Tool composition is
now goal-aware (mark_goal_complete only registers for an active goal),
so the stub needs getGoal present.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@kevin-dp kevin-dp force-pushed the kevin/agent-goal-tracking branch from 6427778 to 655cc46 Compare June 11, 2026 08:22
Rebase integration bug: the native composer (#4533) sends a
WireComposerInputPayload — raw text in `source` plus parsed `nodes` —
instead of `{ text }`. extractWakeText only read `text`, so /goal
messages fell through to the LLM, whose skills guidance made it reply
"I don't have a /goal skill in my catalog". Read `source` as the
fallback; regression tests cover both payload shapes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
kevin-dp and others added 2 commits June 11, 2026 11:53
A goal mutation firing mid-run (e.g. the mark_goal_complete tool) read
the local manifests collection — which lags live writeEvent writes by a
stream round-trip — and persisted through the wake-session's staged
transaction, which replays at end-of-wake. The stale snapshot landed
after, and overwrote, the fresher per-step tokensUsed written live
(observed: bar showed 3.3k after the counter had reached 5.7k).

Route every goal mutation through a single ordered channel (direct
writeEvent upserts when wired; the staged path remains only as a test
fallback) and add an in-wake read-your-writes cache so same-wake reads
always observe the latest write.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
On cache-enabled providers the newly appended prompt tokens of a warm
turn are reported as cacheWrite, with usage.input collapsing to ~0 —
so the budget's "uncached input" side was effectively zero and the cap
tracked output only. Include cacheWrite in the uncached-input figure
the onStepEnd hook reports; cache reads stay excluded.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants