Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 34 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,10 @@ Every 6 hours, the scheduled workflow in this repo:
1. Enumerates every skill in `coder/registry` (both the in-tree
`.agents/skills/` format and the future external-sources format).
2. Shallow-clones each source repo.
3. Runs [NVIDIA SkillSpector](https://github.com/NVIDIA/SkillSpector) in
`--no-llm` static mode over the upstream content.
3. Runs [NVIDIA SkillSpector](https://github.com/NVIDIA/SkillSpector) over
the upstream content. The scheduled scan runs SkillSpector's LLM
semantic pass when the workflow's LLM credential secret is
configured, and falls back to `--no-llm` static-only mode otherwise.
4. Builds a per-skill verdict (`clean`, `suspicious`, `malicious`,
`unknown`) from `risk_score` plus the thresholds in `config.yaml`.
5. Builds the React SPA in `site/` and ships it together with
Expand Down Expand Up @@ -60,6 +62,28 @@ Vite's dev proxy (see `site/vite.config.ts`) forwards `latest.json`,
app sees real scanner output without CORS shenanigans. SPA routes such
as `/skills/coder/setup` stay client-side.

## One-time setup on the repo

Three things have to be configured once on the GitHub repo before the
scheduled scan publishes a useful result:

1. **Settings > Pages**: set source to "GitHub Actions". The
`publish-pages` job in `scan.yaml` will fail until this is set.
2. **Settings > Actions**: workflow permissions "Read and write" so
`publish-release` can create the rolling `latest` release.
3. **Settings > Secrets and variables > Actions > Secrets**: add the
LLM credential matching the provider in `config.yaml`'s
`scanners.skillspector.llm.provider`. For the default `anthropic`
provider this is `ANTHROPIC_API_KEY` (from
[console.anthropic.com](https://console.anthropic.com); this is a
separate billing line from Coder usage because SkillSpector cannot
be routed through aibridge today, see `docs/CALIBRATION.md`).
Without the secret, the scan still runs but SkillSpector falls
back to `--no-llm` static-only mode and precision drops. See
`docs/CALIBRATION.md` for the measured before/after numbers. The
optional `SLACK_WEBHOOK_URL` secret enables the
`notify-slack-on-failure` job; without it that job is a no-op.

## Repo layout

```text
Expand Down Expand Up @@ -97,7 +121,12 @@ This scanner is data-driven. To run it against a different registry:
"GitHub Actions").
4. Set Actions workflow permissions to "Read and write" so the
publish-release job can create releases.
5. Enable Actions.
5. To enable the LLM semantic pass, add the credential secret per
"One-time setup on the repo" above, AND confirm
`.github/workflows/scan.yaml` exports the secret into the
SkillSpector step. Static-only mode (without the secret) is the
default and works out of the box.
6. Enable Actions.

No source changes required for catalogue changes.

Expand All @@ -115,7 +144,8 @@ SkillSpector's `risk_score` (0-100) is the only input. The thresholds
are aligned to SkillSpector's own `HIGH` and `CRITICAL` bands;
[`docs/CALIBRATION.md`](./docs/CALIBRATION.md) walks through the
evidence (SkillSpector source, the ClawHub paper, our in-tree
catalogue) behind the chosen numbers.
catalogue) behind the chosen numbers and the measured LLM-on-vs-off
impact on the five in-tree skills.

The architecture keeps room for additional scanners (gitleaks, Semgrep,
VirusTotal Premium, etc.); adding one is a new module under `scanner/`,
Expand Down
8 changes: 6 additions & 2 deletions config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,12 @@ scanners:
# so a bumper bot lives outside the loop until the upstream
# publishes to PyPI and the pin can move into pyproject.toml.
pin: "skillspector @ git+https://github.com/NVIDIA/SkillSpector.git@2eb844780ab163f01468ecf142c40a2ec0fcaec0"
flags:
- "--no-llm"
# Empty so .github/workflows/scan.yaml can append --no-llm
# dynamically based on whether the LLM credential secret is set.
flags: []
llm:
provider: anthropic
model: "claude-sonnet-4-6"

# Per-skill verdict policy. v1 has one input (SkillSpector risk_score).
# When more scanners join the pipeline we add new threshold fields here
Expand Down
118 changes: 108 additions & 10 deletions docs/CALIBRATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,18 +64,18 @@ The current `coder/registry` in-tree catalogue contains five skills:
`coder/coder-modules`, `coder/coder-templates`, `coder/modules`,
`coder/templates`, and `coder/setup`. Under the chosen thresholds:

| Skill | SkillSpector score | Verdict |
|------------------------|-------------------:|-------------|
| `coder/coder-modules` | 0 | `clean` |
| `coder/coder-templates`| 0 | `clean` |
| `coder/modules` | 0 | `clean` |
| `coder/templates` | 10 | `clean` |
| `coder/setup` | 100 | `malicious` |
| Skill | static score | LLM-mode score | static verdict | LLM-mode verdict |
|------------------------|-------------:|---------------:|----------------|------------------|
| `coder/coder-modules` | 10 | 0 | `clean` | `clean` |
| `coder/coder-templates`| 10 | 0 | `clean` | `clean` |
| `coder/modules` | 0 | 0 | `clean` | `clean` |
| `coder/templates` | 0 | 0 | `clean` | `clean` |
| `coder/setup` | 100 | 26 | `malicious` | `clean` |

The previous thresholds (40/75) produced the same outcome for these
five inputs. The change does not silence any signal that was firing
today; it raises the bar that future skills must clear before being
called out.
five inputs under static-only mode. The change does not silence any
signal that was firing today; it raises the bar that future skills
must clear before being called out.

## Threshold choices

Expand All @@ -99,6 +99,99 @@ verdict:
This avoids broadcasting the ~half-of-catalogue base rate that
ClawHub measured.

## LLM semantic pass

SkillSpector ships a two-stage analyser: fast static rules (the 64
patterns SkillSpector documents) followed by an optional LLM semantic
pass. The LLM pass reads each finding's surrounding context, classifies
intent, filters context-aware false positives, and writes a
human-readable explanation that ships in the per-finding output.

### Measured impact on the five in-tree skills

Measured against `gpt-4.1-mini` through Coder's AI Gateway during
development, before the provider swap below. Methodology: ran
`skillspector scan` twice on each upstream skill (once with
`--no-llm`, once with LLM mode on) and aggregated the per-skill
results. Total catalogue-wide findings dropped from 25 to 2:

| Skill | findings (static) | findings (LLM) | Δ |
|------------------------|------------------:|---------------:|----------|
| `coder/coder-modules` | 1 | 0 | -1 |
| `coder/coder-templates`| 1 | 0 | -1 |
| `coder/modules` | 0 | 0 | 0 |
| `coder/setup` | 23 | 2 | -21 |
| `coder/templates` | 0 | 0 | 0 |
| **TOTAL** | **25** | **2** | **-23** |

`coder/setup`'s verdict moves from `malicious` (100) to `clean` (26).
The LLM filtered all 23 static-only findings as context-aware false
positives (the EA2 hits on safeguard prose, the MP2 hits on PNG
assets, the SC2 hits on `curl coder.com/install.sh`, the PE3 hits on
the skill's own scratch files, etc.) and surfaced 2 new MEDIUM
findings (`SQP-2`) the static pass missed: the GitHub device-flow
scripts write the OAuth token and session config to disk without a
user-visible notification. Those 2 findings are real and minor; the
cleanest fix is a one-line `echo` before each write in the upstream
skill repo rather than any change here.

**Model swap caveat**: production runs against `claude-sonnet-4-6`
via the Anthropic API (see "Provider choice" below), not against
`gpt-4.1-mini`. The 25 → 2 delta above measures SkillSpector's LLM
semantic pass *as a capability*; absolute counts may shift one or two
either way under Claude because the two models filter false positives
slightly differently. The verdict-band outcomes (`coder/setup` flips
malicious → clean, every other in-tree skill stays clean) are robust
to that drift: every static finding on the four other skills is well
below the `suspicious_risk_score: 51` cutoff to begin with, so even a
100% no-filter LLM still leaves them clean. Recalibration against
Claude is a 30-minute follow-up PR once the secret is wired in and
the first production scan lands; this doc gets the real numbers then.

### Provider choice and the workflow gap

The scheduled scan runs LLM mode when the workflow's chosen credential
secret is configured. The fallback to `--no-llm` is automatic when the
secret is missing, so an unset secret on a fresh fork degrades the
scan rather than breaking it.

Provider is `anthropic` against `api.anthropic.com` directly, model
`claude-sonnet-4-6`. The Anthropic API key is on a separate billing
line from Coder usage because SkillSpector cannot be routed through
Coder's AI Gateway today:

- aibridge does proxy Claude under its `/anthropic` path, but only in
Anthropic's native `/v1/messages` shape.
- SkillSpector pipes every provider through
`langchain_openai.ChatOpenAI`, which speaks OpenAI's
`/v1/chat/completions` shape.
- aibridge does not mount `/v1/chat/completions` on its `/anthropic`
path (verified: `route not supported`).
- SkillSpector's `anthropic` provider also hardcodes
`https://api.anthropic.com/v1/` in `providers/anthropic/provider.py`
and ignores `ANTHROPIC_BASE_URL`, so even if aibridge did expose the
OpenAI-compat route on its Anthropic path, an env-only swap would
not steer SkillSpector at it.

Using `openai` against aibridge with `gpt-4.1-mini` is a viable
alternative (and is what the calibration table above was measured
against). The trade-off is real: aibridge routing keeps inference
spend on Coder's existing billing line and avoids a second vendor,
but commits the scanner to whichever OpenAI-class model aibridge
exposes rather than Claude. If aibridge later adds either a Claude
OpenAI-compat route on `/anthropic` or a native-Anthropic
integration into SkillSpector, the provider line in `config.yaml`
flips back without any workflow change.

### How the LLM pass interacts with the verdict math

The LLM pass does not affect the threshold math. SkillSpector's
`risk_score` is still a 0-100 weighted sum of rule hits, and the
51/81 cutoffs above still map directly to `HIGH` and `CRITICAL` bands.
What changes is which findings reach the verdict: false positives the
LLM filters out no longer contribute to the score. Verdicts move down
(or stay the same) when LLM mode flips on, not up.

## What we did not change (and why)

- We did not raise `suspicious_risk_score` above `51`. SkillSpector
Expand Down Expand Up @@ -127,6 +220,11 @@ Re-run this analysis when any of:
that shifts where its bands sit. The pinned commit in `config.yaml`
protects us from drifting silently; a deliberate bump should walk
through this doc.
- The LLM model or provider changes (e.g., moving from
`claude-sonnet-4-6` to Opus, Fable, or to a non-Anthropic
provider). Different models filter differently; spot-check the
five in-tree skills before merging the provider swap and refresh
the table above.
- We observe a real-world skill that lands in an obviously wrong
bucket (false positive or false negative). Open a tracking issue,
link it from this doc, and adjust with evidence in the next PR.
Loading