Skip to content

fix(vpc): fix 3 bug-bash-found defects in create-harness VPC + backfill (follow-up to #1671)#1682

Open
aidandaly24 wants to merge 1 commit into
mainfrom
fix/build-infra-vpc-bugbash-followups
Open

fix(vpc): fix 3 bug-bash-found defects in create-harness VPC + backfill (follow-up to #1671)#1682
aidandaly24 wants to merge 1 commit into
mainfrom
fix/build-infra-vpc-bugbash-followups

Conversation

@aidandaly24

Copy link
Copy Markdown
Contributor

Description

Follow-up to #1671 (build-infra VPC support, squash-merged). A full bug bash (14 isolated scenarios, each adversarially verified) run against the merged branch found three defects, one a blocker regression introduced by the review-fix work in #1671. Fixing them here.

BUG-1 — [BLOCKER] agentcore create --network-mode VPC (harness path) is entirely broken

The prior fix added a validateVpcOptions call inside validateCreateHarnessOptions, but the two call sites in command.tsx (dry-run at ~L165, main at ~L212) never forwarded container/subnets/securityGroups/vpcId into it. So those were always undefined and the validator short-circuited with a misleading --subnets is required when network mode is VPCeven when --subnets was supplied. The entire VPC create-harness path was dead, and the intended friendly errors (--vpc-id is required, Invalid VPC ID format, SG cap) were unreachable.
Fix: forward the four fields at both call sites.
Why unit tests missed it: harness-validate.test.ts calls validateCreateHarnessOptions directly with a fully-populated object, bypassing the broken command.tsx caller. Added integration tests that drive the real CLI dispatch via runCLI --dry-run.

BUG-2 — [MEDIUM] deploy --dry-run mutated and persisted agentcore.json / harness.json

The C2 backfill wrote the resolved vpcId to disk unconditionally, so a "preview without deploying" dirtied the working tree.
Fix: backfillContainerVpcIds takes a persist flag. In preview modes (--dry-run / --diff) it still writes (the CDK synth subprocess reads config from disk) but returns restore(), which reverts the files after synth. Plan and diff short-circuits call it.

BUG-3 — [LOW] TUI: stale "VPC ID" on CodeZip review after Container→CodeZip back-nav

A leftover vpcId rendered a phantom line on the CodeZip review screen and could leak into agentcore.json.
Fix: guard the review render (GenerateWizardUI.tsx) and the schema-mapper write on buildType === 'Container'.

Related Issue

Follow-up to #1671 (no separate tracking issue; #1671 referenced its CDK dependency rather than an issue).

Type of Change

  • Bug fix

Testing

  • I ran npm run test:unit — full unit suite green (6120 passed)
  • I ran npm run typecheck
  • I ran npm run lint
  • Added integration tests (drive the real CLI dispatch) that reproduce BUG-1, plus dry-run restore tests for BUG-2

New/updated tests:

  • create.test.ts: harness VPC dispatch (--dry-run) — accepts full VPC flags, rejects missing --vpc-id with the friendly error, rejects malformed --vpc-id. These drive the command.tsx → validator seam BUG-1 lived in.
  • backfill-vpc-id.test.ts: dry-run (persist=false) restore — writes vpcId for synth then reverts; persist=true leaves it.

Checklist

  • I have added tests that prove the fixes are effective
  • My changes generate no new warnings

…kfill paths

A full bug-bash (14 isolated scenarios, adversarially verified) found three
defects in the review-fix changes — one a blocker regression introduced by the
prior fix commit (59bbde0):

- BLOCKER: `agentcore create --network-mode VPC` (harness path) always failed with
  a misleading "--subnets is required" even when supplied. Root cause: 59bbde0
  added a validateVpcOptions call inside validateCreateHarnessOptions, but the two
  call sites in command.tsx (dry-run + main) never forwarded container/subnets/
  securityGroups/vpcId into it, so those were always undefined. The entire VPC
  create-harness path was dead and the intended friendly errors unreachable. Fix:
  forward the four fields at both call sites.
- MEDIUM: `deploy --dry-run` persisted the backfilled vpcId to agentcore.json/
  harness.json, dirtying the working tree — violating the "preview without
  deploying" contract. Fix: backfillContainerVpcIds takes a `persist` flag; in
  preview mode it still writes (the CDK synth subprocess reads from disk) but
  returns restore() which reverts the files after synth. Plan + diff modes call it.
- LOW (TUI): a stale "VPC ID" line rendered on the CodeZip review screen after a
  Container→CodeZip back-nav, and the stale vpcId could leak into agentcore.json.
  Fix: guard the review render and the schema-mapper write on buildType==='Container'.

The blocker slipped past unit tests because they called validateCreateHarnessOptions
directly with a full options object, bypassing the broken command.tsx caller. Added
integration tests that drive the real CLI dispatch via runCLI --dry-run (would have
caught it), plus dry-run restore tests for the backfill.

Full unit suite green (6120).
@aidandaly24 aidandaly24 requested a review from a team July 2, 2026 18:19
@github-actions github-actions Bot added the size/m PR size: M label Jul 2, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jul 2, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jul 2, 2026
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Package Tarball

aws-agentcore-0.21.1.tgz

How to install

gh release download pr-1682-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.21.1.tgz

@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jul 2, 2026

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up. The three fixes are on-target and the new integration tests exercise the command.tsx → validator seam that the original unit tests bypassed — nice catch on that gap.

I do have one correctness concern with the BUG-2 fix that I'd like addressed before merge; details inline.

// Plan mode: stop after synth and checks, don't deploy. Revert any vpcId the backfill wrote to
// disk for synth — a preview must not dirty the working tree.
if (options.plan) {
await backfill.restore();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backfill.restore() is only called on the two happy-path returns (plan at L485, diff at L504), but between the backfill call at L419 and those returns there are several ways the flow can bail out after agentcore.json/harness.json have already been rewritten to disk:

  • synthesizeCdk at L427 can throw → caught by the outer catch at L982, no restore runs.
  • Stack selection early-return at L436–L444 (still inside the same try), no restore.
  • Bootstrap-required early-return at L456–L463, no restore.
  • Deployability check early-return at L471–L479, no restore.
  • In diff mode, runDiff at L501 can throw → outer catch, no restore.

So a --dry-run or --diff that fails for any reason after the backfill will leave the working tree dirty — which is exactly the "preview must not dirty the working tree" contract this PR's BUG-2 fix is trying to establish. The new dry-run (persist=false) restore test only covers the happy path.

A few ways to fix:

  1. Register restore() for the finally block, e.g. capture it into a previewRestore: (() => Promise<void>) | undefined declared before the try and invoke it in the existing finally (L986–L991) when set. Cheapest change and covers every exit path.
  2. Move the two current await backfill.restore() calls out of the happy branches entirely and rely solely on option 1.
  3. Wrap the backfill + synth + preview span in its own try/finally that always calls restore() in preview mode.

Any of these will do; option 1 is the smallest diff. Please also add a regression test that forces a throw between backfillContainerVpcIds and the plan/diff short-circuit (e.g. by making synthesizeCdk reject) and asserts the on-disk files are still reverted.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jul 2, 2026

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice targeted fix. The BUG-1 dispatch wiring is clearly right and the new integration tests exercise the seam the unit tests missed. The BUG-3 guard is a one-liner and the schema-mapper guard is consistent with it.

One concern on the BUG-2 fix — the preview-mode backfill.restore() calls are placed inline (not in a finally), so several failure paths between the backfill and the explicit restore will still leave agentcore.json / harness.json dirty on a --dry-run / --diff. That's the exact contract the fix is meant to uphold, so I think it's worth handling before merging. Details inline.

// Plan mode: stop after synth and checks, don't deploy. Revert any vpcId the backfill wrote to
// disk for synth — a preview must not dirty the working tree.
if (options.plan) {
await backfill.restore();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUG-2's contract is "a preview must not dirty the working tree" — but the restore is only called on the happy-path exits of the plan/diff blocks (L485, L504). Between the backfill at L419 and those calls there are several failure paths where the file stays dirty on a --dry-run / --diff:

  • synthesizeCdk at L427 throws → caught by the outer catch at L982, restore() never runs.
  • selectTargetStack fails at L436 → early return, no restore.
  • Bootstrap check fails at L451-L464 → early return, no restore.
  • Deployability check fails at L471-L479 → early return, no restore.
  • In --diff mode, runDiff at L501 throws → outer catch, no restore.

Since restore() is a no-op when persist=true (the originals map is empty), it's safe to call unconditionally. A couple of options:

  1. Move the restore into the finally block at L986 — hoist let backfill: BackfillVpcIdResult | null = null; above the try, and add if (backfill) await backfill.restore(); in finally. Cleanest and covers every failure path uniformly.
  2. Wrap the preview flow in its own try/finally — keeps the scope narrower but is more invasive since synth/deployability/etc. are shared with the real deploy path.
  3. At minimum, try { … } finally { await backfill.restore(); } around synth + the diff run when isPreview — narrower still but easy to get subtly wrong later.

Option 1 seems most robust and matches the pattern already used for restoreEnv / toolkitWrapper.dispose() in the same finally.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 38.61% 14364 / 37195
🔵 Statements 37.92% 15311 / 40370
🔵 Functions 33.24% 2474 / 7442
🔵 Branches 32.36% 9568 / 29563
Generated in workflow #3973 for commit 9871ffd by the Vitest Coverage Report Action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/m PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants