Skip to content

Conversation

@sirtimid
Copy link
Contributor

@sirtimid sirtimid commented Feb 6, 2026

Closes #779

Summary

Implements the DGC (Distributed Garbage Collection) protocol for remote kernel communication (Issue #779).

  • deliverBringOutYourDead() on RemoteHandle now sends a ['bringOutYourDead'] wire message to the remote kernel instead of being a no-op
  • The remote kernel schedules a local reap, runs GC, and sends back drops/retires
  • A ping-pong prevention flag (#remoteGcRequested) prevents infinite BOYD loops between kernels

Changes

GC store widening (gc.ts, garbage-collection.ts):

  • Widen scheduleReap to accept EndpointId (union of VatId | RemoteId) instead of just VatId
  • Update addGCActions and processGCActionSet to use insistEndpointId instead of insistVatId
  • Rename internal variables from vatId/actionsByVat to endpointId/actionsByEndpoint for clarity

BOYD protocol (RemoteHandle.ts):

  • Add BringOutYourDeadDelivery to the DeliveryParams union type
  • Implement deliverBringOutYourDead() to send BOYD over the wire via #sendRemoteCommand
  • Add 'bringOutYourDead' case to #handleRemoteDeliver that calls scheduleReap
  • Add #remoteGcRequested flag: set after the savepoint commit (consistent with the transactional message processing pattern from Complete Ken protocol implementation #811), when BOYD was triggered by an incoming remote request the subsequent local BOYD is suppressed

Tests:

  • Unit tests for GC with remote endpoints and reap scheduling
  • Unit tests for BOYD send/receive, ping-pong prevention, seq/ack tracking, and persistence
  • E2E tests for distributed GC across two kernels via libp2p

Test plan

  • All ocap-kernel unit tests pass (1800+)
  • All nodejs e2e tests pass (35)
  • All extension e2e tests pass (19)
  • CI passes

🤖 Generated with Claude Code


Note

Medium Risk
Adds a new cross-kernel bringOutYourDead control message and widens GC scheduling from vats to all endpoints, which can affect remote message flow and garbage-collection behavior if mis-triggered, but changes are mostly additive and well-covered by new unit/e2e tests.

Overview
Implements distributed garbage collection across kernels by turning RemoteHandle.deliverBringOutYourDead() into a real wire-level deliver message (['bringOutYourDead']) that causes the receiving kernel to scheduleReap its remote endpoint and run GC, with a #remoteGcRequested flag to prevent infinite BOYD ping-pong and to reset on peer restart.

Widens GC plumbing from vat-only to endpoint-wide operation by treating GC actions and reap scheduling as EndpointId (vat or remote) rather than VatId, and adds Kernel.reapRemotes() / RemoteManager.reapRemotes() debugging/test hooks to trigger remote reaping.

Adds extensive coverage: new RemoteHandle unit tests for BOYD send/receive, seq/ack/persistence, ping-pong prevention, RemoteManager tests for reap scheduling, GC store tests for remote endpoint IDs, and nodejs e2e scenarios validating BOYD exchange and continued bidirectional messaging after DGC.

Written by Cursor Bugbot for commit ae4b9cc. This will update automatically on new commits. Configure here.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 78.23%
⬆️ +0.02%
6166 / 7881
🔵 Statements 78.21%
⬆️ +0.02%
6265 / 8010
🔵 Functions 76.47%
⬇️ -0.05%
1570 / 2053
🔵 Branches 78.26%
⬆️ +0.05%
2250 / 2875
File Coverage
File Stmts Branches Functions Lines Uncovered Lines
Changed Files
packages/ocap-kernel/src/Kernel.ts 91.13%
⬇️ -2.37%
80.95%
⬇️ -4.05%
87.8%
⬇️ -4.50%
91.13%
⬇️ -2.37%
117, 226-229, 411, 481, 491-492, 535
packages/ocap-kernel/src/garbage-collection/garbage-collection.ts 98.24%
🟰 ±0%
93.33%
🟰 ±0%
100%
🟰 ±0%
98.24%
🟰 ±0%
76
packages/ocap-kernel/src/remotes/kernel/RemoteHandle.ts 88.5%
⬆️ +0.42%
82.56%
⬆️ +1.19%
87.5%
🟰 ±0%
88.77%
⬆️ +0.41%
347, 366-409, 462, 505, 515-517, 558-571, 910, 983, 1029
packages/ocap-kernel/src/remotes/kernel/RemoteManager.ts 98.63%
⬆️ +0.08%
100%
🟰 ±0%
100%
🟰 ±0%
98.63%
⬆️ +0.08%
133
packages/ocap-kernel/src/store/methods/gc.ts 89.74%
🟰 ±0%
68.18%
🟰 ±0%
100%
🟰 ±0%
89.74%
🟰 ±0%
140, 153, 179-186, 202
Generated in workflow #3614 for commit ae4b9cc by the Vitest Coverage Report Action

@sirtimid sirtimid marked this pull request as ready for review February 6, 2026 16:23
@sirtimid sirtimid requested a review from a team as a code owner February 6, 2026 16:23
Copy link
Contributor

@grypez grypez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgrades from vat gc -> endpoint gc

LGTM, but probably @FUDCo should look, too. The ping-pong prevention works for two kernels; is there a case of three kernels that can get BOYD-locked?

const crankResult = await remote.deliverBringOutYourDead();
expect(mockRemoteComms.sendRemoteMessage).toHaveBeenCalledWith(
mockRemotePeerId,
expect.any(String),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth us writing a plugin like expect.serialized({ key: value }).

@grypez grypez requested a review from FUDCo February 6, 2026 21:58
FUDCo
FUDCo previously approved these changes Feb 6, 2026
Copy link
Contributor

@FUDCo FUDCo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remarkable how much of this just consisted of renaming things. Very nice.

sirtimid and others added 2 commits February 9, 2026 15:54
…779)

Implement the DGC protocol so that `deliverBringOutYourDead()` on a
RemoteHandle sends a BOYD wire message to the remote kernel, which
schedules a local reap, runs GC, and sends back drops/retires.

- Widen `scheduleReap` to accept `EndpointId` (vat or remote)
- Update GC action parsing to use `insistEndpointId`
- Add `bringOutYourDead` delivery type to RemoteHandle
- Add ping-pong prevention flag to avoid infinite BOYD loops
- Add unit tests for BOYD protocol and GC endpoint widening
- Add e2e tests for distributed GC in remote-comms

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Move the #remoteGcRequested in-memory state change after the savepoint
commit, consistent with the transactional message processing pattern
from #811. Also consolidate test assertions to use toStrictEqual on
parsed objects per project convention, and fix return value expectation.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@sirtimid
Copy link
Contributor Author

sirtimid commented Feb 9, 2026

is there a case of three kernels that can get BOYD-locked

scheduleReap(this.remoteId) only queues a reap for that specific remote endpoint. When the kernel processes it, it calls deliverBringOutYourDead() on only that one RemoteHandle, not all remotes. The three-kernel BOYD lock cannot happen with the current design

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Copy link
Contributor

@FUDCo FUDCo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest update seems to do exactly what was promised in the TODO comment :-)

I think the issue that Cursor Bugbot flagged is real, so if you need to do another turn of changes to address that, I'm happy to do another quick review round after.

@sirtimid sirtimid added this pull request to the merge queue Feb 9, 2026
FUDCo
FUDCo previously approved these changes Feb 9, 2026
Copy link
Contributor

@FUDCo FUDCo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest update seems to do exactly what was promised in the TODO comment :-)

I think the issue that Cursor Bugbot flagged is real, so if you need to do another turn of changes to address that, I'm happy to do another quick review round after.

…d store cache

The DGC e2e tests had two compounding bugs: they called scheduleReap('r0')
but the first allocated remote is r1, and they used a separate KernelStore
instance whose cache was isolated from the kernel's internal store.

Add reapRemotes() to RemoteManager and Kernel (mirroring reapVats) and
use it in the e2e tests instead of bypassing the kernel with an external
store. Remove getGCActions() and nextRemoteId assertions that suffered
from the same stale cache issue.

Also reset #remoteGcRequested flag in handlePeerRestart() so BOYD is
correctly sent to a new incarnation after peer restart.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@sirtimid sirtimid force-pushed the sirtimid/remote-comms-dgc branch from 5297c39 to 2e895a7 Compare February 9, 2026 19:16
@FUDCo FUDCo self-requested a review February 9, 2026 19:31
Copy link
Contributor

@FUDCo FUDCo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now...

@sirtimid sirtimid enabled auto-merge February 9, 2026 19:36
@sirtimid sirtimid marked this pull request as draft February 9, 2026 20:04
auto-merge was automatically disabled February 9, 2026 20:04

Pull request was converted to draft

@sirtimid sirtimid marked this pull request as ready for review February 9, 2026 20:04
@sirtimid sirtimid enabled auto-merge February 9, 2026 20:05
@sirtimid sirtimid added this pull request to the merge queue Feb 9, 2026
Merged via the queue into main with commit 670daf6 Feb 9, 2026
31 checks passed
@sirtimid sirtimid deleted the sirtimid/remote-comms-dgc branch February 9, 2026 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remote comms: Implement distributed garbage collection (DGC)

3 participants