Skip to content

feat(sandbox): handle destroyed sandbox, add pause/resume, keep-alive#4437

Open
NathanFlurry wants to merge 3 commits intomainfrom
sandbox-actor-resilience
Open

feat(sandbox): handle destroyed sandbox, add pause/resume, keep-alive#4437
NathanFlurry wants to merge 3 commits intomainfrom
sandbox-actor-resilience

Conversation

@NathanFlurry
Copy link
Member

Description

Upgrades sandbox-agent to 0.4.1-rc.1 and adds resilience features to the sandbox actor:

  • Handle SandboxDestroyedError: When the provider reports the sandbox has expired (e.g. E2B timeout), ensureAgent() now catches SandboxDestroyedError and either marks the sandbox as destroyed (onSandboxExpired: "destroy") or transparently provisions a new one (onSandboxExpired: "recreate"). Broadcasts events to connected clients in both cases.
  • Pause/resume actions: New pause and resume actor actions delegate to the provider's pause() and reconnect() hooks for user-initiated sandbox lifecycle control.
  • Keep-alive while sessions active: Configurable keepAliveIntervalMs option periodically calls provider.reconnect() while sessions are subscribed, preventing sandbox timeout during active use.
  • New SDK methods: Proxies pauseSandbox and killSandbox from the updated sandbox-agent SDK.
  • Config additions: onSandboxExpired callback, onSandboxExpired option ("destroy"/"recreate"), keepAliveIntervalMs option.

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

  • Type checking passes with no new errors in sandbox source files
  • Full build:publish build succeeds

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • My changes generate no new warnings

@railway-app
Copy link

railway-app bot commented Mar 16, 2026

🚅 Deployed to the rivet-pr-4437 environment in rivet-frontend

Service Status Web Updated (UTC)
frontend-inspector 😴 Sleeping (View Logs) Web Mar 19, 2026 at 2:01 pm
frontend-cloud 😴 Sleeping (View Logs) Web Mar 17, 2026 at 12:37 am
ladle ❌ Build Failed (View Logs) Web Mar 17, 2026 at 12:27 am
website ❌ Build Failed (View Logs) Web Mar 16, 2026 at 11:58 pm
mcp-hub ✅ Success (View Logs) Web Mar 16, 2026 at 11:56 pm

@claude
Copy link

claude bot commented Mar 16, 2026

PR Review: feat(sandbox): handle destroyed sandbox, add pause/resume, keep-alive

Good overall implementation. The resilience features are well-scoped and the code is readable. A few issues to address:

Bug: onSandboxExpired callback not awaited
File: src/sandbox/actor/index.ts
The callback is typed as void|Promise but the result is never awaited. If the user provides an async callback, it runs fire-and-forget and errors are silently discarded. Fix: await config.onSandboxExpired(c, error).

Bug: destroySandbox proxy does not update actor state
File: src/sandbox/actor/index.ts, buildProxyActions
killSandbox correctly sets c.state.sandboxDestroyed = true and clears sandboxId, but destroySandbox has no equivalent post-action handling. After calling destroySandbox through the proxy, the actor state still considers the sandbox live, and subsequent calls to ensureAgent will attempt to reconnect to a destroyed sandbox.

Bug: Potential state inconsistency in pause action
File: src/sandbox/actor/index.ts, pause action
clearAllActiveSessions(c) is called before provider.pause(). If the provider throws, sessions have already been cleared from actor state but the sandbox is still running. Consider clearing sessions only after the pause completes successfully.

Minor: Captured sandboxId in syncKeepAlive is inconsistently used
File: src/sandbox/actor/index.ts, syncKeepAlive
The closure captures sandboxId but uses c.state.sandboxId for the actual reconnect call. Only the error log uses the captured value. This is either a bug or the capture is unnecessary. Pick one and be consistent.

Minor: Double reconnect in resume
File: src/sandbox/actor/index.ts, resume action
resume calls provider.reconnect() and then ensureAgent(). If ensureAgent or SandboxAgent.start also calls reconnect internally for existing sandboxes, this results in two reconnect calls. Verify this is intentional and idempotent, or let ensureAgent handle the reconnect entirely.

Test coverage gaps
New behaviors with no coverage: pause and resume actions (new user-facing actions with no test), keepAliveIntervalMs (the interval callback path is untested), onSandboxExpired recreate option (the recreate branch in ensureAgent has no test). The E2B tests are a good addition but require an API key and only cover the happy path. The new resilience features deserve unit tests with a mock provider so they can run unconditionally.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Mar 16, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@4437

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@4437

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@4437

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@4437

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@4437

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@4437

@rivetkit/sqlite-vfs

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sqlite-vfs@4437

@rivetkit/traces

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/traces@4437

@rivetkit/workflow-engine

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/workflow-engine@4437

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@4437

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@4437

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@4437

commit: f1b2b04

@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4437 March 17, 2026 00:17 Destroyed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant