Conversation
|
🚅 Deployed to the rivet-pr-4411 environment in rivet-frontend
|
|
Follow-up Inspector UI verification after the replay rename:
Screenshots captured in the workspace:
|
|
PR 4411 Review: feat workflow replay --- SUMMARY --- This PR adds workflow replay/rerun functionality through the inspector, allowing developers to re-execute a workflow from a specific step or from the beginning without redeploying. The implementation touches the workflow engine, RivetKit inspector, the actor HTTP router, and the standalone Inspector frontend. SUMMARY OF CHANGES
--- ISSUES AND OBSERVATIONS --- ISSUE 1. Race condition in restartRunHandler - potential double-run rivetkit-typescript/packages/rivetkit/src/actor/instance/mod.ts (new method restartRunHandler): The check actor.isRunHandlerActive() in workflow/mod.ts and the actual call to restartRunHandler are not atomic. Between those two calls, another concurrent replay or an internal wake could start the run handler active flag again, causing restartRunHandler to wait for that run to finish before checking again and returning early. The guard at the engine level (metadata.status check in replayWorkflowFromStep) helps, but the in-memory isRunHandlerActive() flag is checked before the KV mutation completes, leaving a narrow window. ISSUE 2. Error from WebSocket WorkflowReplayRequest is not caught (FIX BEFORE MERGE) rivetkit-typescript/packages/rivetkit/src/inspector/handler.ts (line ~179-193): Compare to DatabaseSchemaRequest (line ~194), which wraps the await in a try/catch and sends an Error response on failure. WorkflowReplayRequest does not have that try/catch. If replayWorkflowFromStep throws (e.g., step not found, workflow in flight), the exception will propagate to the outer message handler and could silently drop the message or crash the WebSocket handler without informing the client. The in-flight rejection test in the driver test suite only covers the HTTP path, not the WebSocket path. ISSUE 3. loadStorage now eagerly loads all entry metadata - potential KV scan cost on hot path rivetkit-typescript/packages/workflow-engine/src/storage.ts (line ~158-166): Previously loadMetadata lazily fetched individual metadata entries. Now every loadStorage call (including the normal live workflow execution path) does a full prefix scan over all entry metadata. For workflows with many steps, this can be a significant extra read on the hot path. The replay function needs this data, but the observers (inspector) are the minority case. Consider whether this should remain lazy and be loaded on-demand for replay only, or document the performance trade-off explicitly. ISSUE 4. HTTP replay endpoint does not validate entryId type at runtime rivetkit-typescript/packages/rivetkit/src/actor/router.ts (line ~304): There is no runtime validation that body.entryId is actually a string when present. A malformed body with entryId: 12345 would pass through to the engine. Given that replay is a destructive operation (deletes KV entries), adding a runtime check is warranted even if other inspector endpoints follow the same loose pattern. ISSUE 5. Test for in-flight rejection expects internal_error but implementation throws a plain Error rivetkit-typescript/packages/rivetkit/src/driver-test-suite/tests/actor-inspector.ts (line ~1465): replayWorkflowFromStep throws new Error with message Cannot replay a workflow while a step is currently running. Whether this becomes { code: internal_error } depends on the actor router's global error handler. The workflow-specific errors should ideally be surfaced as a distinct error code (e.g., workflow_in_flight) so the frontend can display a user-friendly message. ISSUE 6. syncWorkflowHistoryAfterReplay polling does not cancel on unmount frontend/src/components/actors/workflow/actor-workflow-tab.tsx (line ~158-198): The polling loop is fire-and-forget via void syncWorkflowHistoryAfterReplay. If the user navigates away from the workflow tab before the loop finishes, the loop continues fetching and setting query data for an unmounted component. A useEffect cleanup or an AbortController passed into the loop would be cleaner. ISSUE 7. getInspectorProtocolVersion nesting is fragile frontend/src/components/actors/actor-inspector-context.tsx (version check block): The v4 check is nested inside the v3 check because MIN_RIVETKIT_VERSION_WORKFLOW_REPLAY is newer than MIN_RIVETKIT_VERSION_DATABASE. The nesting is correct but fragile. A flat if-else-if chain ordered from newest to oldest would make the version hierarchy explicit and prevent a future reviewer from accidentally extracting the v4 check to the top level. ISSUE 8. Duplicate query data updates between handleReplay and WebSocket response handler For the embedded WebSocket inspector path, queryClient.setQueryData is called once when the mutation resolves in handleReplay and again when the WorkflowReplayResponse arrives on the WebSocket. This is harmless but results in a redundant double-render with the same data. --- POSITIVE HIGHLIGHTS ---
--- VERDICT --- Most important before merge: Issue 2 (unhandled WebSocket replay errors). Issues 3 (eager metadata load on hot path) and 5 (error code specificity) are also worth addressing. The rest are polish items. |
| <div className="mt-4 flex justify-end"> | ||
| <MaybeTooltip | ||
| content={replayState.tooltip} | ||
| disabled={!replayState.tooltip} |
There was a problem hiding this comment.
i think we can also display whenever user needs to update the actor to the latest rivetkit so he/she can replayh steps
3dc79d9 to
481dd2d
Compare


Description
Add workflow rerun controls to RivetKit workflows through the inspector by introducing a v4 workflow rerun message, HTTP endpoint, and workflow-engine reset helper. Update the standalone Inspector UI with a current-step rerun button, previous-step right-click rerun, and helper text, and make the HTTP inspector route usable with actor inspector tokens so the standalone Inspector can trigger reruns without engine credentials. Also preserve workflow metadata in storage and document the new inspector API.
Type of change
How Has This Been Tested?
pnpm --dir rivetkit-typescript/packages/workflow-engine exec vitest run tests/rerun.test.tspnpm --dir rivetkit-typescript/packages/rivetkit test driver-memory -t "POST /inspector/workflow/rerun reruns a workflow from the beginning|inspector endpoints require auth in non-dev mode|failed workflow steps sleep instead of surfacing as run errors"serve-test-suiteserver, including the current-step rerun button and right-click rerun from a previous step.Checklist: