feat: workflow replay by NathanFlurry · Pull Request #4411 · rivet-dev/rivet

NathanFlurry · 2026-03-12T19:58:11Z

Description

Add workflow rerun controls to RivetKit workflows through the inspector by introducing a v4 workflow rerun message, HTTP endpoint, and workflow-engine reset helper. Update the standalone Inspector UI with a current-step rerun button, previous-step right-click rerun, and helper text, and make the HTTP inspector route usable with actor inspector tokens so the standalone Inspector can trigger reruns without engine credentials. Also preserve workflow metadata in storage and document the new inspector API.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

pnpm --dir rivetkit-typescript/packages/workflow-engine exec vitest run tests/rerun.test.ts
pnpm --dir rivetkit-typescript/packages/rivetkit test driver-memory -t "POST /inspector/workflow/rerun reruns a workflow from the beginning|inspector endpoints require auth in non-dev mode|failed workflow steps sleep instead of surfacing as run errors"
Verified in the standalone Inspector frontend against a local serve-test-suite server, including the current-step rerun button and right-click rerun from a previous step.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

railway-app · 2026-03-12T19:58:21Z

🚅 Deployed to the rivet-pr-4411 environment in rivet-frontend

Service	Status	Web	Updated (UTC)
frontend-cloud	😴 Sleeping (View Logs)	Web	Mar 23, 2026 at 5:37 am
frontend-inspector	😴 Sleeping (View Logs)	Web	Mar 22, 2026 at 8:50 pm
website	😴 Sleeping (View Logs)	Web	Mar 21, 2026 at 12:33 am
mcp-hub	✅ Success (View Logs)	Web	Mar 21, 2026 at 12:22 am
ladle	❌ Build Failed (View Logs)	Web	Mar 21, 2026 at 12:21 am

NathanFlurry · 2026-03-13T04:03:57Z

Follow-up Inspector UI verification after the replay rename:

Hidden running-step case now disables Replay from this step and shows the tooltip Step currently in progress.
Failed-step case still leaves Replay from this step enabled, so operators can bypass the pending retry immediately.
Verified in the standalone Inspector frontend against the local serve-test-suite, not the engine UI.

Screenshots captured in the workspace:

/Users/nathan/conductor/workspaces/rivet/kathmandu/.context/disabled-replay-tooltip.png
/Users/nathan/conductor/workspaces/rivet/kathmandu/.context/enabled-replay-button.png

NathanFlurry · 2026-03-13T04:13:28Z

Standalone Inspector screenshots for the final replay flows:

Failed-step case still leaves Replay from this step enabled, so operators can bypass the pending retry immediately.
Replay is blocked when any workflow step is currently running, with the tooltip Step currently in progress.

pkg-pr-new · 2026-03-13T05:35:35Z

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@4411

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@4411

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@4411

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@4411

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@4411

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@4411

@rivetkit/sqlite-vfs

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sqlite-vfs@4411

@rivetkit/traces

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/traces@4411

@rivetkit/workflow-engine

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/workflow-engine@4411

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@4411

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@4411

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@4411

commit: 481dd2d

claude · 2026-03-13T05:43:02Z

PR 4411 Review: feat workflow replay

--- SUMMARY ---

This PR adds workflow replay/rerun functionality through the inspector, allowing developers to re-execute a workflow from a specific step or from the beginning without redeploying. The implementation touches the workflow engine, RivetKit inspector, the actor HTTP router, and the standalone Inspector frontend.

SUMMARY OF CHANGES

New replayWorkflowFromStep in workflow-engine/src/index.ts that deletes entries from a target step onward and resets workflow state to sleeping.
New ActorWorkflowControlDriver in rivetkit/src/workflow/driver.ts that provides KV access to the workflow engine for replay operations without a RunContext.
Inspector protocol bumped to v4 with a new WorkflowReplayRequest / WorkflowReplayResponse message pair.
New POST /inspector/workflow/replay HTTP endpoint in actor/router.ts.
Updated auth middleware in router.ts to also accept actor-specific inspector tokens.
Frontend additions: replay button in node detail panel, right-click-from-previous-step replay, polling loop to sync UI after replay.
Entry metadata is now eagerly loaded in loadStorage so status is available after actor wakes.
Tests covering full replay, in-flight rejection, and loop boundary rewinding.

--- ISSUES AND OBSERVATIONS ---

ISSUE 1. Race condition in restartRunHandler - potential double-run

rivetkit-typescript/packages/rivetkit/src/actor/instance/mod.ts (new method restartRunHandler):

The check actor.isRunHandlerActive() in workflow/mod.ts and the actual call to restartRunHandler are not atomic. Between those two calls, another concurrent replay or an internal wake could start the run handler active flag again, causing restartRunHandler to wait for that run to finish before checking again and returning early. The guard at the engine level (metadata.status check in replayWorkflowFromStep) helps, but the in-memory isRunHandlerActive() flag is checked before the KV mutation completes, leaving a narrow window.

ISSUE 2. Error from WebSocket WorkflowReplayRequest is not caught (FIX BEFORE MERGE)

rivetkit-typescript/packages/rivetkit/src/inspector/handler.ts (line ~179-193):

Compare to DatabaseSchemaRequest (line ~194), which wraps the await in a try/catch and sends an Error response on failure. WorkflowReplayRequest does not have that try/catch. If replayWorkflowFromStep throws (e.g., step not found, workflow in flight), the exception will propagate to the outer message handler and could silently drop the message or crash the WebSocket handler without informing the client. The in-flight rejection test in the driver test suite only covers the HTTP path, not the WebSocket path.

ISSUE 3. loadStorage now eagerly loads all entry metadata - potential KV scan cost on hot path

rivetkit-typescript/packages/workflow-engine/src/storage.ts (line ~158-166):

Previously loadMetadata lazily fetched individual metadata entries. Now every loadStorage call (including the normal live workflow execution path) does a full prefix scan over all entry metadata. For workflows with many steps, this can be a significant extra read on the hot path. The replay function needs this data, but the observers (inspector) are the minority case. Consider whether this should remain lazy and be loaded on-demand for replay only, or document the performance trade-off explicitly.

ISSUE 4. HTTP replay endpoint does not validate entryId type at runtime

rivetkit-typescript/packages/rivetkit/src/actor/router.ts (line ~304):

There is no runtime validation that body.entryId is actually a string when present. A malformed body with entryId: 12345 would pass through to the engine. Given that replay is a destructive operation (deletes KV entries), adding a runtime check is warranted even if other inspector endpoints follow the same loose pattern.

ISSUE 5. Test for in-flight rejection expects internal_error but implementation throws a plain Error

rivetkit-typescript/packages/rivetkit/src/driver-test-suite/tests/actor-inspector.ts (line ~1465):

replayWorkflowFromStep throws new Error with message Cannot replay a workflow while a step is currently running. Whether this becomes { code: internal_error } depends on the actor router's global error handler. The workflow-specific errors should ideally be surfaced as a distinct error code (e.g., workflow_in_flight) so the frontend can display a user-friendly message.

ISSUE 6. syncWorkflowHistoryAfterReplay polling does not cancel on unmount

frontend/src/components/actors/workflow/actor-workflow-tab.tsx (line ~158-198):

The polling loop is fire-and-forget via void syncWorkflowHistoryAfterReplay. If the user navigates away from the workflow tab before the loop finishes, the loop continues fetching and setting query data for an unmounted component. A useEffect cleanup or an AbortController passed into the loop would be cleaner.

ISSUE 7. getInspectorProtocolVersion nesting is fragile

frontend/src/components/actors/actor-inspector-context.tsx (version check block):

The v4 check is nested inside the v3 check because MIN_RIVETKIT_VERSION_WORKFLOW_REPLAY is newer than MIN_RIVETKIT_VERSION_DATABASE. The nesting is correct but fragile. A flat if-else-if chain ordered from newest to oldest would make the version hierarchy explicit and prevent a future reviewer from accidentally extracting the v4 check to the top level.

ISSUE 8. Duplicate query data updates between handleReplay and WebSocket response handler

For the embedded WebSocket inspector path, queryClient.setQueryData is called once when the mutation resolves in handleReplay and again when the WorkflowReplayResponse arrives on the WebSocket. This is harmless but results in a redundant double-render with the same data.

--- POSITIVE HIGHLIGHTS ---

The per-actor WeakMap approach correctly handles multiple actor instances sharing the same workflow() definition without leaking state between them.
The findReplayBoundaryEntry logic for loop rewinding is well-reasoned: finding the deepest enclosing loop ensures full loop re-execution rather than partial state.
The versioned protocol upgrade follows the existing pattern cleanly. v3 clients that receive a WorkflowReplayResponse get a graceful WORKFLOW_HISTORY_DROPPED_ERROR error message.
Tests cover the key scenarios: full replay, mid-workflow replay, loop boundary replay, and the sleeping-loop edge case.
The dual-token auth extension (global RIVET_INSPECTOR_TOKEN OR actor-specific inspector token) uses timingSafeEqual for both paths, which is correct.

--- VERDICT ---

Most important before merge: Issue 2 (unhandled WebSocket replay errors). Issues 3 (eager metadata load on hot path) and 5 (error code specificity) are also worth addressing. The rest are polish items.

jog1t

react part looks good!

jog1t · 2026-03-13T15:32:30Z

frontend/src/components/actors/workflow/workflow-visualizer.tsx

+							<div className="mt-4 flex justify-end">
+								<MaybeTooltip
+									content={replayState.tooltip}
+									disabled={!replayState.tooltip}


i think we can also display whenever user needs to update the actor to the latest rivetkit so he/she can replayh steps

railway-app bot had a problem deploying to rivet-frontend / rivet-pr-4411 March 12, 2026 19:58 Failure

NathanFlurry requested review from jog1t and removed request for jog1t March 12, 2026 20:01

railway-app bot had a problem deploying to rivet-frontend / rivet-pr-4411 March 13, 2026 04:11 Failure

railway-app bot had a problem deploying to rivet-frontend / rivet-pr-4411 March 13, 2026 04:37 Failure

railway-app bot had a problem deploying to rivet-frontend / rivet-pr-4411 March 13, 2026 05:24 Failure

railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4411 March 13, 2026 05:32 Destroyed

jog1t approved these changes Mar 13, 2026

View reviewed changes

NathanFlurry changed the title ~~Add workflow rerun controls to RivetKit Inspector~~ feat: workflow replay Mar 20, 2026

NathanFlurry added 5 commits March 20, 2026 17:04

feat(rivetkit): rerun workflows from inspector

55bde00

docs(rivetkit): document workflow rerun inspector api

eeb06c4

feat(rivetkit): add workflow replay controls

ca37d5a

fix(rivetkit): reject replay while workflow is in flight

4fa4430

fix(workflow-engine): replay steps inside loops

481dd2d

NathanFlurry force-pushed the workflow-step-resume branch from 3dc79d9 to 481dd2d Compare March 21, 2026 00:21

railway-app bot had a problem deploying to rivet-frontend / rivet-pr-4411 March 21, 2026 00:21 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: workflow replay#4411

feat: workflow replay#4411
NathanFlurry wants to merge 5 commits intomainfrom
workflow-step-resume

NathanFlurry commented Mar 12, 2026

Uh oh!

railway-app bot commented Mar 12, 2026 •

edited

Loading

Uh oh!

NathanFlurry commented Mar 13, 2026

Uh oh!

NathanFlurry commented Mar 13, 2026 •

edited

Loading

Uh oh!

pkg-pr-new bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

claude bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

jog1t left a comment

Uh oh!

jog1t Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NathanFlurry commented Mar 12, 2026

Description

Type of change

How Has This Been Tested?

Checklist:

Uh oh!

railway-app bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NathanFlurry commented Mar 13, 2026

Uh oh!

NathanFlurry commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkg-pr-new bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jog1t left a comment

Choose a reason for hiding this comment

Uh oh!

jog1t Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

railway-app bot commented Mar 12, 2026 •

edited

Loading

NathanFlurry commented Mar 13, 2026 •

edited

Loading

pkg-pr-new bot commented Mar 13, 2026 •

edited

Loading

claude bot commented Mar 13, 2026 •

edited

Loading