Skip to content

feat: synth daemon#1561

Open
megha-narayanan wants to merge 11 commits into
aws:feat/cdk-lsp-explorerfrom
megha-narayanan:feat/explorer-synth-daemon
Open

feat: synth daemon#1561
megha-narayanan wants to merge 11 commits into
aws:feat/cdk-lsp-explorerfrom
megha-narayanan:feat/explorer-synth-daemon

Conversation

@megha-narayanan
Copy link
Copy Markdown

Long-running daemon which coordinates synth runs for LSP servers

Checklist

  • This change contains a major version upgrade for a dependency and I confirm all breaking changes are addressed
    • Release notes for the new version:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions Bot added the p2 label May 27, 2026
@aws-cdk-automation aws-cdk-automation requested a review from a team May 27, 2026 20:15
@megha-narayanan megha-narayanan marked this pull request as ready for review May 27, 2026 20:30
Copy link
Copy Markdown
Contributor

@ShadowCat567 ShadowCat567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future PRs can you give more of an overview of what I am looking at/reviewing in the PR description since even though I know the architecture of what you are going for, figuring out which file made most sense to start in and how the code in the different files worked together was a bit challenging. It would also be helpful to have some more comments describing what is going on with your functions since many of them took me some time to parse and figure out what they were trying to accomplish (especially because this kind of architecture is not commonly used in CDK).
Otherwise it looks pretty good overall, I left some comments for you to take a look at.

// ---------------------------------------------------------------------------
// Protocol version — manually bumped when the wire protocol changes.
// Used for daemon version handshake: if client and daemon versions differ,
// the daemon shuts down so a new one matching the client can be spawned.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I am entirely understanding what this is describing. Why would our client and daemon version differ? What causes the protocol version to change? What does wire protocol mean?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wire protocol was teh format of the JSON message that we sent over this socket. This was for if you update your cli, so we have a new version of the client (LSP server/web) which may have different signals, but the daemon from the OLD version could still be running. It doesn't necessarily shut down when your CLI updates. So I put this in as a catch for this case, so a new daemon from the new version of the client can be spun up. Otherwise we could silently miss messages. Updated the documentation here to reflect that.

* Uses an exclusive lock file to prevent concurrent spawns.
* If a daemon is already running, connects to it directly.
*/
export async function ensureDaemon(options: EnsureDaemonOptions): Promise<DaemonConnection> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't really like ensure as a part of this name, I would prefer checkDaemon a bit better, but I am happy to hear other alternatives

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to acquireDaemon, wanted to make it clear that we don't only check and see if a daemon is there, but also spin one up if there isn't

}

const lockPath = lockPathForProject(projectDir);
const lockFd = acquireLock(lockPath);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this locking step so we don't attempt to spawn a new daemon when something else has already spawned a new daemon?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, its so that if a daemon dies, two clients (like an LSP and the Web Server) can't both spin up a new daemon at the same time. I put this in docstring of the function, lmk if that can be clearer

const lockPath = lockPathForProject(projectDir);
const lockFd = acquireLock(lockPath);

try {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a try block (attempting to connect to the daemon after acquiring the lock)?
I feel a bit weird about these nested try blocks and I am wondering if there is a reason why you chose this method or if there is a better way

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because after we acquire the lock, if we were, say, the loser of a race, the try connect would succeed. else, if the daemon still doesn't exist, it would error, so that's what we would catch (and when we would make a new daemon). I can take this into a tryConnect helper which will return undefined on failure, to avoid the nested try-catch blocks here. teh outer try-finally is to make sure that no matter what happens, we HAVE TO release the lock. even if we have a timeout or a hang or unexpected failure, like if spawn daemon times out.

try {
const fd = fs.openSync(lockPath, 'wx');
fs.writeSync(fd, String(process.pid));
return fd;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this section of code is repeated in a few places it might be a good candidate for a small function

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, also did unlinkIfExists

if (resolve) {
const r = resolve;
resolve = undefined;
r({ value: undefined as never, done: true });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the value can be undefined shouldn't it be optional? Why are we casting to never? (Also what is r)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The casting as never was a workaround to make TypeScript happy.....when iteration is done, there's no value to return, but the type still demands one. I've cleaned this up there's now a shared DONE constant with the right type (IteratorReturnResult<undefined>).

r is the stored promise resolver, I named it r to avoid shadowing the outer resolve, but you're right it's cryptic. Renamed to pending (someone is waiting for the next message) and settle (we're handing them their answer). The grab-then-null pattern (const settle = pending; pending = undefined; settle(...)) just prevents accidentally resolving the same promise twice.

server.start().then(
() => {
// eslint-disable-next-line no-console
console.log(`Daemon started: pid=${process.pid} socket=${socketPath}`);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CDK might have some kind of console writing function you can make use of instead of console.log, can you look into that?

Copy link
Copy Markdown
Author

@megha-narayanan megha-narayanan May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeaaaaa.... there's one in toolkit-lib. It does bring in some stuff we don't need, but I do think we should use this eventually. One side effect is that that function requires an action field, which would have to be synth here even though the daemon also logs non-synth lifecycle stuff (startup, idle timeout, shutdown) ... this doesn't feel totally correct to me but I don't think altering the ToolkitAction type is the correct thing to do either, since everything else in that type is a user-facing CLI command (packages/@aws-cdk/toolkit-lib/lib/api/io/toolkit-action.ts). Happy to hear thoughts and alternatives.

return server.start();
}

test('connects and completes handshake', async () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this test isn't really doing that, we are not checking to see whether the handshake went through.
I am not entirely understanding how connection.send and connection.close being functions relates to whether the handshake was completed

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connectToDaemon performs the handshake internally and throws if it fails, so this is basically testing that (interally) when the daemon is running, it doesn't fail. I do think this test is redundant though, the happy-path is already covered by the other tests in this file that successfully connect, subscribe, and exchange messages (like the next test after this one, receives messages via async iterable) which is more robust.

I removed it and added a proper test that proves the handshake is a real gate: 'rejects connection on protocol version mismatch'. It mocks PROTOCOL_VERSION to a wrong value and asserts that connectToDaemon throws. That proves the handshake actually happens and rejects mismatched clients.

Comment thread packages/@aws-cdk/cdk-explorer/test/daemon/connect.test.ts
if (server) {
await server.stop();
}
try {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there a try/catch block here? What would cause unlinkSync to fail in a test? Is there anything we can do to make sure it doesn't fail?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the try catch block was because the socket file SHOULD have been deleted already because stop() usually removes the file first. this was a safety net to prevent leaking socket files between tests, even if stop fails/hangs, but I realize that actually isn't doing what it needs to as is. fixed to put stop instdie the try block, so we actually cover the failure case with clean up.

- Remove triggerFile from protocol (YAGNI — synth can't use it)
- Add synthGeneration counter to prevent onSynthSettled double-fire
  when both timeout and promise resolve
- Remove leaked error listener on successful socket connect
- Wrap spawn in try/finally to prevent logFd leak on spawn failure
Copy link
Copy Markdown
Contributor

@ShadowCat567 ShadowCat567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comments to the functions in this PR! Otherwise this looks pretty good, have a few minor suggestions


await cleanupStaleState(projectDir);
await spawnDaemon(projectDir);
return await connectToDaemon(projectDir);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we still go to this section if we are not the raceWinner?

};
}

const DONE: IteratorReturnResult<undefined> = { value: undefined, done: true };
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if this were at the top of the file to make it easier to find

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants