Resolve plain skill names from registry during install#4200
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4200 +/- ##
==========================================
+ Coverage 69.07% 69.14% +0.06%
==========================================
Files 477 477
Lines 48074 48135 +61
==========================================
+ Hits 33207 33282 +75
+ Misses 12284 12268 -16
- Partials 2583 2585 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
amirejaz
left a comment
There was a problem hiding this comment.
One inline note on buildSkillLookupOption.
amirejaz
left a comment
There was a problem hiding this comment.
LGTM, added a couple of nits
0fe2e7a to
66a114b
Compare
54f40ba to
6adb4d6
Compare
6adb4d6 to
e094220
Compare
bed576e to
dd40d9c
Compare
dd40d9c to
8a8864f
Compare
8a8864f to
23d7e75
Compare
When `thv skill install <name>` does not find the skill in the local OCI store, the code previously created a dead-end "pending" DB record. Now it queries the configured skill registry/index for an exact name match and, if found, extracts the OCI reference and installs from it. If no match is found anywhere, a 404 with an actionable error message is returned. This is Phase 2 of #4015. The resolution cascade is now: 1. Local OCI store (from `thv skill build`) 2. Registry/index lookup (new) 3. Actionable 404 error (replaces silent pending record) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The SkillLookup was created once at server startup from the config at that point. If the registry was changed at runtime (via the API or thv config set-registry), the skill lookup still used the old provider, so plain-name installs could not find skills in the new registry. Replace the eager buildSkillLookupOption with a lazySkillLookup that calls GetDefaultProviderWithConfig on each SearchSkills invocation. After ResetDefaultProvider (called on every registry config change), the next lookup creates a fresh provider from the current config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Update E2E tests for registry-based skill resolution
Plain name installs without a prior build now return 404 instead of
creating a dead-end "pending" record. Update all E2E tests that relied
on the pending path to use the build-then-install flow, and add a test
verifying the 404 behavior for unresolvable names.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add E2E test for registry-based skill name resolution
Exercises the full registry lookup chain end-to-end:
1. Start an in-process OCI registry (go-containerregistry/pkg/registry)
2. Build a skill and push it to the in-process registry
3. Create an upstream-format registry JSON with a skill entry pointing
to the in-process registry
4. Configure the thv server to use that registry file
5. Install by plain name and verify the skill resolves through the
registry, pulls from OCI, and installs with full metadata
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix E2E test failures: plain HTTP for dev, idempotent installs
Two fixes for CI failures:
1. Enable plain HTTP for the OCI registry client when TOOLHIVE_DEV=true
so E2E tests can push/pull to an in-process httptest registry without
TLS.
2. Update duplicate/overwrite tests to expect 201 (idempotent no-op)
instead of 409 Conflict. With real build-then-install, reinstalling
the same digest is correctly idempotent. The old 409 was an artifact
of the removed pending-install path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix push assertion: expect 204 No Content, not 200
The skill push API returns 204 No Content on success.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Temporarily skip registry lookup E2E test
The upstream registry JSON schema validation rejects the test fixture.
The in-process OCI registry, plain HTTP push, and registry update
plumbing all work (verified in prior CI runs), but the upstream format
schema requires fields beyond what the test provides. Mark as pending
until the exact schema requirements are debugged.
All other E2E tests (48/49) pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix registry lookup E2E: add dummy server to pass validation
The upstream registry file validator requires at least one server or
group. Our test JSON had only skills with an empty servers array, which
was rejected with 502. Add a minimal dummy server entry to satisfy the
validation while keeping the focus on skill resolution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a comment explaining that the first OCI package in registry order is chosen when exactly one skill matches, addressing review feedback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Skills API Describe had the `api-clients` label, causing it to run in the api-clients CI job. The registry lookup test modified the shared config directory's registry to a test-only file but never restored it. Subsequent client tests failed with 404 because "osv" could no longer be resolved from the corrupted registry. Move Skills API tests to the `api-registry` label where registry mutations are expected, and add DeferCleanup to reset the registry config after the test (matching the pattern in api_registry_test.go). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
23d7e75 to
43b375f
Compare
jhrozek
left a comment
There was a problem hiding this comment.
LGTM — solid implementation with good test coverage and security defenses (supply-chain name matching, OCI identifier truncation, ambiguity detection). Just two minor nits below.
Nit (not in diff): The doc comment on Install (line 195) still says "Without LayerData, a pending record is created" but the pending path was removed in this PR. Consider updating to describe the current resolution chain: OCI reference → local store → registry → 404.
🤖 Generated with Claude Code
|
|
||
| results, err := s.skillLookup.SearchSkills(name) | ||
| if err != nil { | ||
| slog.Warn("registry skill lookup failed, falling back to not-found", "name", name, "error", err) |
There was a problem hiding this comment.
Nit: when SearchSkills returns an error (network failure, auth error, server 500), the error is logged at WARN but converted to (nil, nil), making it indistinguishable from "no match found." The user sees "skill not found in local store or registry" with no indication that the registry was actually unreachable.
The s.skillLookup == nil check at line 608 already handles the "no registry configured" case, so a non-nil lookup returning an error is a genuine problem worth surfacing. Consider either returning the error to the caller or including a hint in the not-found message that the registry lookup failed.
Summary
thv skill install <name>previously created a dead-end "pending" DB record when a plain name wasn't found in the local OCI store. This made the command silently fail for users who expected registry-based resolution.This is Phase 2 of #4015. The resolution cascade is now:
thv skill build)Relates to #4015
Type of change
Test plan
task test)task lint-fix)Changes
pkg/skills/skillsvc/skillsvc.goSkillLookupinterface andWithSkillLookupoption; implementresolveFromRegistrymethod; replaceinstallPendingfallback with registry lookup cascade; fix lock management to prevent deadlockpkg/api/server.goWithSkillLookupviabuildSkillLookupOptionhelper; rename local variable to avoid import shadowpkg/skills/skillsvc/skillsvc_test.goTestConcurrentInstallAndUninstallto exercise install critical section; addTestInstallFromRegistrywith 9 subtests covering happy path, ambiguity, no OCI package, error degradation, case-insensitive matching, and supply-chain validationDoes this introduce a user-facing change?
Yes.
thv skill install <name>now resolves plain skill names from the configured registry instead of creating a non-functional pending record. If the skill is not found anywhere, users get an actionable error message suggesting the OCI reference syntax.Special notes for reviewers
installFromOCIacquires its own lock on the artifact's skill name (which could be the same key). Go'ssync.Mutexis not re-entrant, so holding both would deadlock. The TOCTOU window is safe becauseinstallWithExtractionperforms idempotent checks (same digest = no-op, different digest = upgrade).installPendingremoved: The function and its callers are removed. TheInstallStatusPendingconstant is retained for existing DB records.Generated with Claude Code