Skip to content

Resolve plain skill names from registry during install#4200

Merged
JAORMX merged 5 commits intomainfrom
jaosorior/skill-install-registry-lookup
Mar 23, 2026
Merged

Resolve plain skill names from registry during install#4200
JAORMX merged 5 commits intomainfrom
jaosorior/skill-install-registry-lookup

Conversation

@JAORMX
Copy link
Collaborator

@JAORMX JAORMX commented Mar 18, 2026

Summary

  • thv skill install <name> previously created a dead-end "pending" DB record when a plain name wasn't found in the local OCI store. This made the command silently fail for users who expected registry-based resolution.
  • Now the install command queries the configured skill registry/index for an exact name match and, if found, extracts the OCI reference and installs from it. If nothing resolves, a 404 with an actionable error message is returned instead of a silent pending record.

This is Phase 2 of #4015. The resolution cascade is now:

  1. Local OCI store (from thv skill build)
  2. Registry/index lookup (new)
  3. Actionable 404 error (replaces silent pending record)

Relates to #4015

Type of change

  • New feature

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix)

Changes

File Change
pkg/skills/skillsvc/skillsvc.go Add SkillLookup interface and WithSkillLookup option; implement resolveFromRegistry method; replace installPending fallback with registry lookup cascade; fix lock management to prevent deadlock
pkg/api/server.go Wire registry provider as WithSkillLookup via buildSkillLookupOption helper; rename local variable to avoid import shadow
pkg/skills/skillsvc/skillsvc_test.go Update tests that expected pending fallback to expect 404; fix TestConcurrentInstallAndUninstall to exercise install critical section; add TestInstallFromRegistry with 9 subtests covering happy path, ambiguity, no OCI package, error degradation, case-insensitive matching, and supply-chain validation

Does this introduce a user-facing change?

Yes. thv skill install <name> now resolves plain skill names from the configured registry instead of creating a non-functional pending record. If the skill is not found anywhere, users get an actionable error message suggesting the OCI reference syntax.

Special notes for reviewers

  • Lock management: The per-skill lock is released before entering the registry lookup path because installFromOCI acquires its own lock on the artifact's skill name (which could be the same key). Go's sync.Mutex is not re-entrant, so holding both would deadlock. The TOCTOU window is safe because installWithExtraction performs idempotent checks (same digest = no-op, different digest = upgrade).
  • installPending removed: The function and its callers are removed. The InstallStatusPending constant is retained for existing DB records.
  • Security hardening: Registry-returned data in error messages is truncated (identifiers capped at 256 chars, ambiguity candidates capped at 5). An audit log at INFO level records registry-resolved OCI references.
  • Graceful degradation: Registry lookup errors are logged at WARN and fall through to the not-found error, so a registry outage never blocks installation by explicit OCI reference.

Generated with Claude Code

@JAORMX JAORMX requested a review from amirejaz as a code owner March 18, 2026 11:32
@github-actions github-actions bot added the size/L Large PR: 600-999 lines changed label Mar 18, 2026
@codecov
Copy link

codecov bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 76.74419% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.14%. Comparing base (949ae42) to head (43b375f).

Files with missing lines Patch % Lines
pkg/api/server.go 0.00% 15 Missing ⚠️
pkg/skills/skillsvc/skillsvc.go 92.95% 3 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4200      +/-   ##
==========================================
+ Coverage   69.07%   69.14%   +0.06%     
==========================================
  Files         477      477              
  Lines       48074    48135      +61     
==========================================
+ Hits        33207    33282      +75     
+ Misses      12284    12268      -16     
- Partials     2583     2585       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@amirejaz amirejaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One inline note on buildSkillLookupOption.

amirejaz
amirejaz previously approved these changes Mar 18, 2026
Copy link
Contributor

@amirejaz amirejaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, added a couple of nits

@JAORMX JAORMX force-pushed the jaosorior/skill-install-registry-lookup branch from 0fe2e7a to 66a114b Compare March 18, 2026 14:55
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 18, 2026
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 19, 2026
@JAORMX JAORMX force-pushed the jaosorior/skill-install-registry-lookup branch from 54f40ba to 6adb4d6 Compare March 19, 2026 14:22
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 19, 2026
@JAORMX JAORMX force-pushed the jaosorior/skill-install-registry-lookup branch from 6adb4d6 to e094220 Compare March 19, 2026 14:23
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 19, 2026
@JAORMX JAORMX force-pushed the jaosorior/skill-install-registry-lookup branch from bed576e to dd40d9c Compare March 20, 2026 08:21
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 20, 2026
@JAORMX JAORMX force-pushed the jaosorior/skill-install-registry-lookup branch from dd40d9c to 8a8864f Compare March 23, 2026 06:19
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 23, 2026
@JAORMX JAORMX force-pushed the jaosorior/skill-install-registry-lookup branch from 8a8864f to 23d7e75 Compare March 23, 2026 06:54
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 23, 2026
JAORMX and others added 5 commits March 23, 2026 09:39
When `thv skill install <name>` does not find the skill in the local OCI
store, the code previously created a dead-end "pending" DB record. Now it
queries the configured skill registry/index for an exact name match and,
if found, extracts the OCI reference and installs from it. If no match is
found anywhere, a 404 with an actionable error message is returned.

This is Phase 2 of #4015. The resolution cascade is now:
  1. Local OCI store (from `thv skill build`)
  2. Registry/index lookup (new)
  3. Actionable 404 error (replaces silent pending record)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The SkillLookup was created once at server startup from the config at
that point. If the registry was changed at runtime (via the API or
thv config set-registry), the skill lookup still used the old provider,
so plain-name installs could not find skills in the new registry.

Replace the eager buildSkillLookupOption with a lazySkillLookup that
calls GetDefaultProviderWithConfig on each SearchSkills invocation.
After ResetDefaultProvider (called on every registry config change),
the next lookup creates a fresh provider from the current config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Update E2E tests for registry-based skill resolution

Plain name installs without a prior build now return 404 instead of
creating a dead-end "pending" record. Update all E2E tests that relied
on the pending path to use the build-then-install flow, and add a test
verifying the 404 behavior for unresolvable names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add E2E test for registry-based skill name resolution

Exercises the full registry lookup chain end-to-end:
  1. Start an in-process OCI registry (go-containerregistry/pkg/registry)
  2. Build a skill and push it to the in-process registry
  3. Create an upstream-format registry JSON with a skill entry pointing
     to the in-process registry
  4. Configure the thv server to use that registry file
  5. Install by plain name and verify the skill resolves through the
     registry, pulls from OCI, and installs with full metadata

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix E2E test failures: plain HTTP for dev, idempotent installs

Two fixes for CI failures:

1. Enable plain HTTP for the OCI registry client when TOOLHIVE_DEV=true
   so E2E tests can push/pull to an in-process httptest registry without
   TLS.

2. Update duplicate/overwrite tests to expect 201 (idempotent no-op)
   instead of 409 Conflict. With real build-then-install, reinstalling
   the same digest is correctly idempotent. The old 409 was an artifact
   of the removed pending-install path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix push assertion: expect 204 No Content, not 200

The skill push API returns 204 No Content on success.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Temporarily skip registry lookup E2E test

The upstream registry JSON schema validation rejects the test fixture.
The in-process OCI registry, plain HTTP push, and registry update
plumbing all work (verified in prior CI runs), but the upstream format
schema requires fields beyond what the test provides. Mark as pending
until the exact schema requirements are debugged.

All other E2E tests (48/49) pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix registry lookup E2E: add dummy server to pass validation

The upstream registry file validator requires at least one server or
group. Our test JSON had only skills with an empty servers array, which
was rejected with 502. Add a minimal dummy server entry to satisfy the
validation while keeping the focus on skill resolution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a comment explaining that the first OCI package in registry order
is chosen when exactly one skill matches, addressing review feedback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Skills API Describe had the `api-clients` label, causing it to run
in the api-clients CI job. The registry lookup test modified the shared
config directory's registry to a test-only file but never restored it.
Subsequent client tests failed with 404 because "osv" could no longer
be resolved from the corrupted registry.

Move Skills API tests to the `api-registry` label where registry
mutations are expected, and add DeferCleanup to reset the registry
config after the test (matching the pattern in api_registry_test.go).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JAORMX JAORMX force-pushed the jaosorior/skill-install-registry-lookup branch from 23d7e75 to 43b375f Compare March 23, 2026 07:39
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 23, 2026
Copy link
Member

@aponcedeleonch aponcedeleonch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JAORMX JAORMX merged commit 0e61527 into main Mar 23, 2026
42 of 43 checks passed
@JAORMX JAORMX deleted the jaosorior/skill-install-registry-lookup branch March 23, 2026 10:47
Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — solid implementation with good test coverage and security defenses (supply-chain name matching, OCI identifier truncation, ambiguity detection). Just two minor nits below.

Nit (not in diff): The doc comment on Install (line 195) still says "Without LayerData, a pending record is created" but the pending path was removed in this PR. Consider updating to describe the current resolution chain: OCI reference → local store → registry → 404.

🤖 Generated with Claude Code


results, err := s.skillLookup.SearchSkills(name)
if err != nil {
slog.Warn("registry skill lookup failed, falling back to not-found", "name", name, "error", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: when SearchSkills returns an error (network failure, auth error, server 500), the error is logged at WARN but converted to (nil, nil), making it indistinguishable from "no match found." The user sees "skill not found in local store or registry" with no indication that the registry was actually unreachable.

The s.skillLookup == nil check at line 608 already handles the "no registry configured" case, so a non-nil lookup returning an error is a genuine problem worth surfacing. Consider either returning the error to the caller or including a hint in the not-found message that the registry lookup failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Large PR: 600-999 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants