Skip to content

fix(agents-server): stop orphaned cron wakes#4559

Merged
KyleAMathews merged 4 commits into
mainfrom
fix-cron-wake-leak
Jun 11, 2026
Merged

fix(agents-server): stop orphaned cron wakes#4559
KyleAMathews merged 4 commits into
mainfrom
fix-cron-wake-leak

Conversation

@KyleAMathews

@KyleAMathews KyleAMathews commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Fixes orphaned cron wakes in Agents Server after a manifest-backed schedule is deleted. Users should no longer see agents repeatedly woken by a cron schedule that list_schedules correctly reports as absent.

Root Cause

Cron schedules are represented by two linked pieces of state:

  1. a manifest-backed schedule row on the agent stream, used by tools like list_schedules
  2. wake-registry + scheduler side effects, used to deliver cron ticks

For /horton/RbYyjlX3ey, the manifest schedule had been deleted, so list_schedules returned []. However, cron wake delivery kept going because stale wake/scheduler side effects could outlive the manifest deletion:

  • WakeRegistry removed cached registrations on Electric shape deletes by deriving the DB id from the shape message key. Shape keys are protocol-level identifiers and are not guaranteed to be the table primary key, so the DB row could be gone while the in-memory registration remained.
  • Cron ticks rescheduled themselves indefinitely, even when no wake_registrations rows still subscribed to that cron stream.

Approach

This PR makes the cron side effects self-healing in both places.

Wake registry delete handling

WakeRegistry.applyShapeMessage now treats old_value.id as authoritative for delete messages. The shape uses replica: full, so deletes should carry the deleted row; if the id is missing, the registry resets its cache to fail closed instead of keeping a potentially stale wake registration alive.

const oldValue = (message as unknown as {
  old_value?: { id?: unknown }
}).old_value
const oldId = Number(oldValue?.id)
if (Number.isFinite(oldId)) {
  this.removeCachedRegistrationByDbId(oldId)
} else {
  this.resetCachedRegistrations()
}

Cron chain rescheduling

Before inserting the next cron tick, the scheduler checks whether any wake registration still points at that cron stream:

select 1 as exists
from wake_registrations
where tenant_id = ${tenantId}
  and source_url = ${streamPath}
limit 1

If there are no subscribers, it completes the current task and stops the chain. A future schedule registration will seed a fresh tick through getOrCreateCronStream / rehydration.

Key Invariants

  • The manifest remains the user-visible source of truth for schedule tools.
  • A cron stream should only keep producing future ticks while at least one wake registration subscribes to it.
  • Deleting a manifest-backed cron schedule must remove both durable wake registrations and any effective in-memory wake registrations.
  • Shared cron streams remain reusable: deleting the last subscriber stops the chain; adding a subscriber later recreates it.

Non-goals

  • This does not change the public schedule tool API.
  • This does not delete historical cron stream events.
  • This does not introduce per-schedule cron streams; cron streams remain shared by expression/timezone.
  • This does not attempt to clean up already-delivered wake events from agent timelines.

Trade-offs

An alternative would be to always reschedule cron ticks forever for every cron stream ever created. That is simpler, but it allows orphaned cron streams to keep waking stale in-memory subscribers and creates unnecessary scheduler churn. Checking for subscribers before rescheduling keeps cron streams lazy and aligns their lifecycle with active wake registrations.

Another option would be to rely only on wake-registry cache invalidation. The scheduler guard is still useful as a second line of defense and prevents orphaned cron task chains even if a process restarts or a wake cache bug reappears.

Verification

Commands run:

pnpm install
pnpm --filter @electric-ax/agents-runtime build
pnpm --filter @electric-ax/agents-mcp build
pnpm --filter @electric-sql/client build
pnpm --filter @electric-ax/agents-server typecheck
cd packages/agents-server && pnpm test test/scheduler.test.ts
cd packages/agents-server && pnpm test test/wake-registry.test.ts -t "removes cached registrations"
GITHUB_BASE_REF=main node scripts/check-changeset.mjs
git diff --check

Notes:

  • pnpm install completed dependency linking but reported warnings, including examples/.shared prepare failing due SST.
  • The focused wake-registry unit test passes. Running all of wake-registry.test.ts locally reached the integration section and timed out with a webhook delivery fetch failed warning, so the new unit path was verified with -t.
  • check-changeset passes.

Files changed

  • packages/agents-server/src/wake-registry.ts — remove cached registrations using old_value.id from shape delete messages; reset the cache if a delete id is unavailable.
  • packages/agents-server/src/scheduler.ts — stop rescheduling cron ticks when no registrations subscribe to the cron stream; clarify cron payload stream-path extraction and subscriber lookup.
  • packages/agents-server/test/scheduler.test.ts — cover subscriber-present and no-subscriber cron reschedule paths.
  • packages/agents-server/test/wake-registry.test.ts — cover non-id shape keys with delete ids coming from old_value.id.
  • .changeset/fix-orphaned-cron-wakes.md — patch changeset for @electric-ax/agents-server.

@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 86.66667% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.55%. Comparing base (671a38f) to head (7b300ea).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
packages/agents-server/src/scheduler.ts 77.77% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #4559       +/-   ##
===========================================
- Coverage   74.35%   55.55%   -18.80%     
===========================================
  Files          52      318      +266     
  Lines        6897    36894    +29997     
  Branches     2190    10569     +8379     
===========================================
+ Hits         5128    20497    +15369     
- Misses       1753    16364    +14611     
- Partials       16       33       +17     
Flag Coverage Δ
packages/agents 70.53% <ø> (?)
packages/agents-mobile 71.42% <ø> (?)
packages/agents-runtime 81.75% <ø> (?)
packages/agents-server 74.12% <86.66%> (-0.23%) ⬇️
packages/agents-server-ui 5.65% <ø> (?)
packages/electric-ax 46.42% <ø> (?)
typescript 55.55% <86.66%> (-18.80%) ⬇️
unit-tests 55.55% <86.66%> (-18.80%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@KyleAMathews KyleAMathews merged commit 618810c into main Jun 11, 2026
33 checks passed
@KyleAMathews KyleAMathews deleted the fix-cron-wake-leak branch June 11, 2026 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants