Skip to content

ci: skip agents with broken upstream URLs instead of failing the whole update run#276

Open
stefandevo wants to merge 1 commit intoagentclientprotocol:mainfrom
stefandevo:fix/skip-broken-url-updates
Open

ci: skip agents with broken upstream URLs instead of failing the whole update run#276
stefandevo wants to merge 1 commit intoagentclientprotocol:mainfrom
stefandevo:fix/skip-broken-url-updates

Conversation

@stefandevo
Copy link
Copy Markdown
Contributor

Summary

The hourly Update Agent Versions workflow has been failing every run for several days (e.g. run 25275167557) because of a single agent — vtcode — whose upstream stopped publishing a Windows binary archive at the new version:

Error: vtcode distribution URL validation failed:
Error:   - Binary archive URL not accessible for windows-x86_64:
           https://github.com/vinhnx/VTCode/releases/download/0.105.5/vtcode-0.105.5-x86_64-pc-windows-msvc.zip
Build failed due to validation errors

vtcode 0.96.14 (currently in main) ships a Windows zip; 0.105.5 doesn't. So update_versions.py --apply rewrites vtcode/agent.json to point at a non-existent file, build_registry.py URL-validates it, the whole step exits 1, and no other agent gets updated until a human intervenes.

This PR makes the apply step resilient to single-agent breakage: when a planned update's new URLs aren't reachable, that one agent is rolled back to its previous content and reported as skipped, while the rest of the updates proceed.

Changes

  • Move url_exists() and validate_distribution_urls() from build_registry.py into registry_utils.py so both scripts share the same check (no behavior change for build_registry.py).
  • update_versions.py apply path now revalidates URLs after writing. If any URL fails, the original agent.json is restored byte-for-byte and the agent is recorded as skipped. Hard I/O errors still fail the run; broken upstream URLs no longer do.
  • JSON output gains a skipped array alongside updates / errors / up_to_date, so the workflow's downstream consumers (commit message, verify-agents) only see updates that actually landed.
  • Tests: unit tests for the moved helpers in registry_utils, and tests for the new revert-and-skip path in apply_update. Also smoke-tested live against the real broken vtcode 0.105.5 Windows URL.

Behavior after this change

For the failing run: macOS, Linux, and the 19 other agents that have updates would all land in main; vtcode would stay at 0.96.14 (whose URLs all still work) and appear in the summary as:

Skipped (broken upstream URLs, kept at current version):
  - vtcode: 0.96.14 -> 0.105.5: Binary archive URL not accessible for windows-x86_64: ...

When upstream republishes the Windows binary (or the agent owner removes the windows-x86_64 target), vtcode will resume auto-updating with no further action — no manual quarantine entry needed.

Test plan

  • pytest .github/workflows/tests/ — 98 passed (93 baseline + 5 new)
  • Smoke test: ran apply_update against the real vtcode/agent.json with live network, confirmed the file is reverted to its original 0.96.14 content and the skip reason names the missing Windows zip
  • Maintainer check: re-run the Update Agent Versions workflow once merged

… failing

When an upstream release stops publishing one of an agent's declared
binary archives (e.g. vtcode 0.105.5 dropped its Windows zip), the
hourly auto-update workflow currently dies in the registry-build
URL-validation step. That blocks updates for *every* other agent until a
human edits the offending agent.json.

Make the apply step resilient instead:

- Hoist url_exists() and validate_distribution_urls() into
  registry_utils.py so update_versions and build_registry share the
  same check.
- After update_versions writes a new agent.json, re-validate its
  distribution URLs; if any 404, restore the original file content
  byte-for-byte and report the agent as "skipped" rather than failed.
- Track skipped updates separately from real I/O failures so the
  workflow exits 0 when the only problem is a broken upstream binary.
- Surface skipped agents in both the JSON output and the human summary
  so it's obvious which agents are stuck and why.
- Add unit tests covering the revert path and the moved helpers.

Refs: failing run https://github.com/agentclientprotocol/registry/actions/runs/25275167557
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant