skills: add evals/evals.json smoke suite (group A) by rgsl888prabhu · Pull Request #1308 · NVIDIA/cuopt

rgsl888prabhu · 2026-05-27T20:12:17Z

Summary

Split from skills: add evals/evals.json smoke suite #1302 — adds evals/evals.json for 4 skills (group A).
Each skill gets one happy-path Q&A entry parallel to its existing benchmark/ directory.
API-skill questions ask for an ordered method-name list rather than runnable code, so scoring is pure text-pattern matching (no cuopt/cudf install or execution required).

Adds one happy-path Q&A entry per skill, parallel to the existing benchmark/ directories. Split from #1302 to keep CI runtime per PR under the job timeout. This PR covers: - cuopt-user-rules - cuopt-routing-api-python - cuopt-numerical-optimization-api-python - cuopt-numerical-optimization-api-cli Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>

rgsl888prabhu · 2026-05-27T20:15:08Z

/nvskills-ci

coderabbitai · 2026-05-27T20:18:50Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 45e00ea6-faf9-4164-8bbd-4b338e42755a

📥 Commits

Reviewing files that changed from the base of the PR and between b3f7c8f and cedb505.

📒 Files selected for processing (8)

skills/cuopt-numerical-optimization-api-cli/skill-card.md
skills/cuopt-numerical-optimization-api-cli/skill.oms.sig
skills/cuopt-numerical-optimization-api-python/skill-card.md
skills/cuopt-numerical-optimization-api-python/skill.oms.sig
skills/cuopt-routing-api-python/skill-card.md
skills/cuopt-routing-api-python/skill.oms.sig
skills/cuopt-user-rules/skill-card.md
skills/cuopt-user-rules/skill.oms.sig

✅ Files skipped from review due to trivial changes (1)

skills/cuopt-numerical-optimization-api-cli/skill-card.md

📝 Walkthrough

Walkthrough

Adds four evaluation cases (CLI MPS, Python LP API, Python routing VRPTW, user-rules clarification), updates several skill-card frontmatter texts, and regenerates Sigstore DSSE signature bundles for the affected skills.

Changes

cuOpt Skill Evaluation Specifications and Metadata

Layer / File(s)	Summary
Evaluation specs `skills/*/evals/evals.json`	Adds four evaluation JSON entries: CLI MPS/cli-command expectations, LP API call-sequence expectations, routing VRPTW API call-sequence expectations, and a user-rules “ask clarifying questions” evaluation.
Skill-card edits (CLI) `skills/cuopt-numerical-optimization-api-cli/skill-card.md`	Reworded Use Case to reference LP/MILP/QP via `cuopt_cli` with MPS-format and added “Code” to Output Type(s).
Skill-card edits (Numerical Python) `skills/cuopt-numerical-optimization-api-python/skill-card.md`	Adjusted top-level description to explicitly reference cuOpt Python API, expanded Use Case examples, and revised Reference(s) list.
Skill-card edits (Routing Python) `skills/cuopt-routing-api-python/skill-card.md`	Refined Use Case wording and removed “Configuration instructions” from Output Type(s), leaving Code and API Calls.
Skill-card edits (User rules) `skills/cuopt-user-rules/skill-card.md`	Rephrased Use Case for safe interaction patterns and removed “Configuration instructions” from Output Type(s).
Sigstore DSSE bundles `skills/*/skill.oms.sig`	Replaced `dsseEnvelope.payload` and updated `signatures[0].sig` in four skill sig bundles to match new payloads; verificationMaterial and tlogEntries unchanged.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

NVIDIA/cuopt#1287: Related sigstore/eval updates that also touch skill signature bundles and evaluation artifacts.

Suggested reviewers

Iroy30
tmckayus
bdice

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: adding evaluation specifications (evals.json) for four cuopt skills, indicating this is part A of a larger effort.
Description check	✅ Passed	The description is directly related to the changeset, explaining the purpose of adding evals.json entries, the grouping as 'group A', and the rationale for text-pattern matching scoring.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch skills-add-evals-suite-a

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@skills/cuopt-numerical-optimization-api-python/evals/evals.json`:
- Line 13: The evals.json entry listing PascalCase problem.Status.name values
incorrectly includes "FeasibleFound"; remove "FeasibleFound" from the expected
status list so the LP/continuous check only expects "Optimal" and
"PrimalFeasible" (i.e., align with LPTerminationStatus rather than
MILPTerminationStatus) and ensure any ground_truth checks that compare
problem.Status.name use ['Optimal','PrimalFeasible'] only.

In `@skills/cuopt-user-rules/evals/evals.json`:
- Line 7: The eval's ground_truth string in
skills/cuopt-user-rules/evals/evals.json currently lists "Python / C / REST" but
SKILL.md's Interface includes "CLI"; update the ground_truth entry to include
"CLI" among the interface choices (e.g., "Python / C / REST / CLI") so the eval
questions match SKILL.md, and ensure any downstream expected_behavior text that
enumerates interfaces also reflects CLI (or, if omitting CLI intentionally, add
a brief justification string in the ground_truth referencing SKILL.md's
interface list).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5374cae1-b836-4352-98a8-d33b6c9db304

📥 Commits

Reviewing files that changed from the base of the PR and between 16276d2 and b3f7c8f.

📒 Files selected for processing (4)

skills/cuopt-numerical-optimization-api-cli/evals/evals.json
skills/cuopt-numerical-optimization-api-python/evals/evals.json
skills/cuopt-routing-api-python/evals/evals.json
skills/cuopt-user-rules/evals/evals.json

coderabbitai · 2026-05-27T20:18:53Z

+      "Lists the API calls in order without producing a full runnable script",
+      "Names Problem, addVariable (with vtype=CONTINUOUS), addConstraint, setObjective (sense=MAXIMIZE)",
+      "Names SolverSettings, set_parameter('time_limit', ...), and problem.solve(settings)",
+      "Names problem.Status.name and the PascalCase status values (Optimal / PrimalFeasible / FeasibleFound)",


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Search for Status enum/class definition and all valid status values. # Search for Status class or enum definition ast-grep --pattern 'class Status' # Search for status value references in examples and documentation rg -n 'Status\.name|Status\.' skills/cuopt-numerical-optimization-api-python/ -g '*.{py,md}' -C2 # Search specifically for FeasibleFound rg -n 'FeasibleFound|Feasible' skills/cuopt-numerical-optimization-api-python/ -g '*.{py,md}'

Repository: NVIDIA/cuopt

Length of output: 17947

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show evals.json around the referenced line echo "---- evals.json (lines 1-40) ----" nl -ba skills/cuopt-numerical-optimization-api-python/evals/evals.json | sed -n '1,60p' # Locate likely model.py referenced by the evals and inspect around line 29 echo "---- candidate model.py files under evals/ ----" find skills/cuopt-numerical-optimization-api-python -maxdepth 3 -type f -name 'model.py' -path '*evals*' -print echo "---- show any model.py under evals/ (best-effort around line 20-45) ----" for f in $(find skills/cuopt-numerical-optimization-api-python -maxdepth 3 -type f -name 'model.py' -path '*evals*' -print); do echo "==== $f ====" nl -ba "$f" | sed -n '15,60p' done # Also search for FeasibleFound usage in evals.json region if present echo "---- FeasibleFound references in evals/ ----" rg -n 'FeasibleFound' skills/cuopt-numerical-optimization-api-python/evals -S || true

Repository: NVIDIA/cuopt

Length of output: 151

🏁 Script executed:

#!/bin/bash set -euo pipefail EVALS_JSON="skills/cuopt-numerical-optimization-api-python/evals/evals.json" echo "---- evals.json (first 60 lines) ----" cat -n "$EVALS_JSON" | sed -n '1,60p' echo "---- evals.json (around line 13) ----" cat -n "$EVALS_JSON" | sed -n '8,18p' echo "---- model.py files under evals/ (if any) ----" find "skills/cuopt-numerical-optimization-api-python/evals" -type f -name "model.py" -print echo "---- FeasibleFound references in evals/ ----" rg -n "FeasibleFound" "skills/cuopt-numerical-optimization-api-python/evals" -S || true echo "---- FeasibleFound references anywhere in skill examples/docs (context) ----" rg -n "FeasibleFound" "skills/cuopt-numerical-optimization-api-python" -g "*.{py,md,json}" -S | head -n 50

Repository: NVIDIA/cuopt

Length of output: 4774

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "---- repo-wide occurrences of FeasibleFound ----" rg -n "FeasibleFound" -S . | head -n 50 echo "---- repo-wide occurrences of PrimalFeasible ----" rg -n "PrimalFeasible" -S . | head -n 50 echo "---- search for Status enum/class definitions ----" rg -n "enum\s+.*Status|class\s+Status|Status\s*=\s*enum|Status.*(Optimal|PrimalFeasible|FeasibleFound)" -S . | head -n 50 echo "---- search for PascalCase status name mapping (common patterns) ----" rg -n "\"Optimal\"|\"PrimalFeasible\"|\"FeasibleFound\"|\"PrimalInfeasible\"|\"DualInfeasible\"|\"IterationLimit\"|\"NoTermination\"|\"TimeLimit\"" -S . | head -n 50

Repository: NVIDIA/cuopt

Length of output: 25671

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect Python bindings for termination status enums echo "---- solver_wrapper.pyx: termination status enum mappings ----" sed -n '1,140p' python/cuopt/cuopt/linear_programming/solver/solver_wrapper.pyx echo "---- solver.pxd: declarations for termination statuses ----" sed -n '1,120p' python/cuopt/cuopt/linear_programming/solver/solver.pxd # Inspect server-side mapping / status lists echo "---- data_definition.py: status lists (around PrimalFeasible / FeasibleFound) ----" sed -n '740,880p' python/cuopt_server/cuopt_server/utils/linear_programming/data_definition.py echo "---- solver.py: where MILPTerminationStatus.FeasibleFound is used ----" sed -n '240,320p' python/cuopt_server/cuopt_server/utils/linear_programming/solver.py # Show any LPTerminationStatus / pdlp termination status values echo "---- Search for LPTerminationStatus definition in python/cuopt_server ----" rg -n "LPTerminationStatus|pdlp_termination_status_t|PDLP.*TerminationStatus" python/cuopt_server/cuopt_server/utils/linear_programming -S || true echo "---- Search for LPTerminationStatus definition in python/cuopt ----" rg -n "LPTerminationStatus|pdlp_termination_status_t|PDLP.*TerminationStatus" python/cuopt/cuopt -S || true

Repository: NVIDIA/cuopt

Length of output: 25054

Clarify FeasibleFound status naming for the LP eval.
FeasibleFound exists as a MILP/IP termination status (MILPTerminationStatus) but is not part of the LP/continuous termination statuses (LPTerminationStatus); LP uses Optimal / PrimalFeasible (plus other PDLP statuses). Remove FeasibleFound from the expected problem.Status.name PascalCase values in skills/cuopt-numerical-optimization-api-python/evals/evals.json (line 13), since it also conflicts with the eval’s LP ground_truth check for ['Optimal', 'PrimalFeasible'].

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/cuopt-numerical-optimization-api-python/evals/evals.json` at line 13, The evals.json entry listing PascalCase problem.Status.name values incorrectly includes "FeasibleFound"; remove "FeasibleFound" from the expected status list so the LP/continuous check only expects "Optimal" and "PrimalFeasible" (i.e., align with LPTerminationStatus rather than MILPTerminationStatus) and ensure any ground_truth checks that compare problem.Status.name use ['Optimal','PrimalFeasible'] only.

coderabbitai · 2026-05-27T20:18:53Z

+    "question": "Help me optimize my routing.",
+    "expected_skill": "cuopt-user-rules",
+    "expected_script": null,
+    "ground_truth": "The prompt is incomplete on every dimension. Per the user-rules skill, the agent must ask before assuming. It asks: (a) Language / interface — Python, C, or REST server? (b) Problem type — TSP, VRP, or PDP? (c) Data — does the user have a cost / distance matrix, order locations, fleet definition, or should the agent generate a small sample dataset for demonstration? (d) Constraints — time windows, vehicle capacities, precedence, service times? (e) Problem size — number of locations, vehicles, orders? (f) Performance — time limit, solution-quality target? It does not produce code, does not silently choose Python+VRP and emit a starter script, and does not invent constraint values. If the user later says 'just create a sample dataset', the agent will state clearly what it synthesized (size, depot assumption, time windows used) before producing code.",


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Check if CLI is listed as a cuOpt interface in user-rules or related documentation # Search for interface lists in the user-rules skill documentation rg -n -C3 -i 'interface.*python.*rest|python.*c.*rest|python.*c.*cli' skills/cuopt-user-rules/ # Check if CLI is mentioned alongside Python/C/REST in any cuopt skills rg -n -C3 'CLI.*Python.*REST|Python.*REST.*CLI' skills/

Repository: NVIDIA/cuopt

Length of output: 2668

Include CLI in the cuopt-user-rules eval interface questions.
skills/cuopt-user-rules/evals/evals.json (ground_truth/expected_behavior) asks only “Python / C / REST”, but skills/cuopt-user-rules/SKILL.md lists Interface: [Python/REST/C/CLI]; update the eval expectations to include CLI (or explicitly justify why routing omits it).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/cuopt-user-rules/evals/evals.json` at line 7, The eval's ground_truth string in skills/cuopt-user-rules/evals/evals.json currently lists "Python / C / REST" but SKILL.md's Interface includes "CLI"; update the ground_truth entry to include "CLI" among the interface choices (e.g., "Python / C / REST / CLI") so the eval questions match SKILL.md, and ensure any downstream expected_behavior text that enumerates interfaces also reflects CLI (or, if omitting CLI intentionally, add a brief justification string in the ground_truth referencing SKILL.md's interface list).

copy-pr-bot · 2026-05-27T20:36:21Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

rgsl888prabhu · 2026-05-27T20:47:51Z

/ok to test cedb505

rgsl888prabhu requested a review from a team as a code owner May 27, 2026 20:12

rgsl888prabhu requested a review from Iroy30 May 27, 2026 20:12

rgsl888prabhu mentioned this pull request May 27, 2026

skills: add evals/evals.json smoke suite (group B) #1309

Open

tmckayus approved these changes May 27, 2026

View reviewed changes

rgsl888prabhu self-assigned this May 27, 2026

rgsl888prabhu added the improvement Improves an existing functionality label May 27, 2026

rgsl888prabhu mentioned this pull request May 27, 2026

skills: add evals/evals.json smoke suite #1302

Merged

rgsl888prabhu added the non-breaking Introduces a non-breaking change label May 27, 2026

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

Attach NVSkills validation signatures

cedb505

rgsl888prabhu merged commit d415bb6 into main May 27, 2026
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skills: add evals/evals.json smoke suite (group A)#1308

skills: add evals/evals.json smoke suite (group A)#1308
rgsl888prabhu merged 2 commits into
mainfrom
skills-add-evals-suite-a

rgsl888prabhu commented May 27, 2026 •

edited

Loading

Uh oh!

rgsl888prabhu commented May 27, 2026

Uh oh!

coderabbitai Bot commented May 27, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 27, 2026

Uh oh!

coderabbitai Bot May 27, 2026

Uh oh!

copy-pr-bot Bot commented May 27, 2026

Uh oh!

rgsl888prabhu commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rgsl888prabhu commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

rgsl888prabhu commented May 27, 2026

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

copy-pr-bot Bot commented May 27, 2026

Uh oh!

rgsl888prabhu commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rgsl888prabhu commented May 27, 2026 •

edited

Loading

coderabbitai Bot commented May 27, 2026 •

edited

Loading