fix(vpc): fix 3 bug-bash-found defects in create-harness VPC + backfill (follow-up to #1671)#1682
fix(vpc): fix 3 bug-bash-found defects in create-harness VPC + backfill (follow-up to #1671)#1682aidandaly24 wants to merge 1 commit into
Conversation
…kfill paths A full bug-bash (14 isolated scenarios, adversarially verified) found three defects in the review-fix changes — one a blocker regression introduced by the prior fix commit (59bbde0): - BLOCKER: `agentcore create --network-mode VPC` (harness path) always failed with a misleading "--subnets is required" even when supplied. Root cause: 59bbde0 added a validateVpcOptions call inside validateCreateHarnessOptions, but the two call sites in command.tsx (dry-run + main) never forwarded container/subnets/ securityGroups/vpcId into it, so those were always undefined. The entire VPC create-harness path was dead and the intended friendly errors unreachable. Fix: forward the four fields at both call sites. - MEDIUM: `deploy --dry-run` persisted the backfilled vpcId to agentcore.json/ harness.json, dirtying the working tree — violating the "preview without deploying" contract. Fix: backfillContainerVpcIds takes a `persist` flag; in preview mode it still writes (the CDK synth subprocess reads from disk) but returns restore() which reverts the files after synth. Plan + diff modes call it. - LOW (TUI): a stale "VPC ID" line rendered on the CodeZip review screen after a Container→CodeZip back-nav, and the stale vpcId could leak into agentcore.json. Fix: guard the review render and the schema-mapper write on buildType==='Container'. The blocker slipped past unit tests because they called validateCreateHarnessOptions directly with a full options object, bypassing the broken command.tsx caller. Added integration tests that drive the real CLI dispatch via runCLI --dry-run (would have caught it), plus dry-run restore tests for the backfill. Full unit suite green (6120).
Package TarballHow to installgh release download pr-1682-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.21.1.tgz |
|
Claude Security Review: no high-confidence findings. (run) |
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
Thanks for the follow-up. The three fixes are on-target and the new integration tests exercise the command.tsx → validator seam that the original unit tests bypassed — nice catch on that gap.
I do have one correctness concern with the BUG-2 fix that I'd like addressed before merge; details inline.
| // Plan mode: stop after synth and checks, don't deploy. Revert any vpcId the backfill wrote to | ||
| // disk for synth — a preview must not dirty the working tree. | ||
| if (options.plan) { | ||
| await backfill.restore(); |
There was a problem hiding this comment.
backfill.restore() is only called on the two happy-path returns (plan at L485, diff at L504), but between the backfill call at L419 and those returns there are several ways the flow can bail out after agentcore.json/harness.json have already been rewritten to disk:
synthesizeCdkat L427 can throw → caught by the outercatchat L982, no restore runs.- Stack selection early-return at L436–L444 (still inside the same
try), no restore. - Bootstrap-required early-return at L456–L463, no restore.
- Deployability check early-return at L471–L479, no restore.
- In diff mode,
runDiffat L501 can throw → outer catch, no restore.
So a --dry-run or --diff that fails for any reason after the backfill will leave the working tree dirty — which is exactly the "preview must not dirty the working tree" contract this PR's BUG-2 fix is trying to establish. The new dry-run (persist=false) restore test only covers the happy path.
A few ways to fix:
- Register
restore()for thefinallyblock, e.g. capture it into apreviewRestore: (() => Promise<void>) | undefineddeclared before thetryand invoke it in the existingfinally(L986–L991) when set. Cheapest change and covers every exit path. - Move the two current
await backfill.restore()calls out of the happy branches entirely and rely solely on option 1. - Wrap the backfill + synth + preview span in its own
try/finallythat always callsrestore()in preview mode.
Any of these will do; option 1 is the smallest diff. Please also add a regression test that forces a throw between backfillContainerVpcIds and the plan/diff short-circuit (e.g. by making synthesizeCdk reject) and asserts the on-disk files are still reverted.
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
Nice targeted fix. The BUG-1 dispatch wiring is clearly right and the new integration tests exercise the seam the unit tests missed. The BUG-3 guard is a one-liner and the schema-mapper guard is consistent with it.
One concern on the BUG-2 fix — the preview-mode backfill.restore() calls are placed inline (not in a finally), so several failure paths between the backfill and the explicit restore will still leave agentcore.json / harness.json dirty on a --dry-run / --diff. That's the exact contract the fix is meant to uphold, so I think it's worth handling before merging. Details inline.
| // Plan mode: stop after synth and checks, don't deploy. Revert any vpcId the backfill wrote to | ||
| // disk for synth — a preview must not dirty the working tree. | ||
| if (options.plan) { | ||
| await backfill.restore(); |
There was a problem hiding this comment.
BUG-2's contract is "a preview must not dirty the working tree" — but the restore is only called on the happy-path exits of the plan/diff blocks (L485, L504). Between the backfill at L419 and those calls there are several failure paths where the file stays dirty on a --dry-run / --diff:
synthesizeCdkat L427 throws → caught by the outer catch at L982,restore()never runs.selectTargetStackfails at L436 → earlyreturn, no restore.- Bootstrap check fails at L451-L464 → early
return, no restore. - Deployability check fails at L471-L479 → early
return, no restore. - In
--diffmode,runDiffat L501 throws → outer catch, no restore.
Since restore() is a no-op when persist=true (the originals map is empty), it's safe to call unconditionally. A couple of options:
- Move the restore into the
finallyblock at L986 — hoistlet backfill: BackfillVpcIdResult | null = null;above thetry, and addif (backfill) await backfill.restore();infinally. Cleanest and covers every failure path uniformly. - Wrap the preview flow in its own
try/finally— keeps the scope narrower but is more invasive since synth/deployability/etc. are shared with the real deploy path. - At minimum,
try { … } finally { await backfill.restore(); }around synth + the diff run whenisPreview— narrower still but easy to get subtly wrong later.
Option 1 seems most robust and matches the pattern already used for restoreEnv / toolkitWrapper.dispose() in the same finally.
Coverage Report
|
Description
Follow-up to #1671 (build-infra VPC support, squash-merged). A full bug bash (14 isolated scenarios, each adversarially verified) run against the merged branch found three defects, one a blocker regression introduced by the review-fix work in #1671. Fixing them here.
BUG-1 — [BLOCKER]
agentcore create --network-mode VPC(harness path) is entirely brokenThe prior fix added a
validateVpcOptionscall insidevalidateCreateHarnessOptions, but the two call sites incommand.tsx(dry-run at ~L165, main at ~L212) never forwardedcontainer/subnets/securityGroups/vpcIdinto it. So those were alwaysundefinedand the validator short-circuited with a misleading--subnets is required when network mode is VPC— even when--subnetswas supplied. The entire VPC create-harness path was dead, and the intended friendly errors (--vpc-id is required,Invalid VPC ID format, SG cap) were unreachable.Fix: forward the four fields at both call sites.
Why unit tests missed it:
harness-validate.test.tscallsvalidateCreateHarnessOptionsdirectly with a fully-populated object, bypassing the brokencommand.tsxcaller. Added integration tests that drive the real CLI dispatch viarunCLI --dry-run.BUG-2 — [MEDIUM]
deploy --dry-runmutated and persistedagentcore.json/harness.jsonThe C2 backfill wrote the resolved
vpcIdto disk unconditionally, so a "preview without deploying" dirtied the working tree.Fix:
backfillContainerVpcIdstakes apersistflag. In preview modes (--dry-run/--diff) it still writes (the CDK synth subprocess reads config from disk) but returnsrestore(), which reverts the files after synth. Plan and diff short-circuits call it.BUG-3 — [LOW] TUI: stale "VPC ID" on CodeZip review after Container→CodeZip back-nav
A leftover
vpcIdrendered a phantom line on the CodeZip review screen and could leak intoagentcore.json.Fix: guard the review render (
GenerateWizardUI.tsx) and the schema-mapper write onbuildType === 'Container'.Related Issue
Follow-up to #1671 (no separate tracking issue; #1671 referenced its CDK dependency rather than an issue).
Type of Change
Testing
npm run test:unit— full unit suite green (6120 passed)npm run typechecknpm run lintNew/updated tests:
create.test.ts:harness VPC dispatch (--dry-run)— accepts full VPC flags, rejects missing--vpc-idwith the friendly error, rejects malformed--vpc-id. These drive thecommand.tsx → validatorseam BUG-1 lived in.backfill-vpc-id.test.ts:dry-run (persist=false) restore— writes vpcId for synth then reverts;persist=trueleaves it.Checklist