Skip to content

Refactor job.yml format: inline steps, step_arguments, DeepWork Reviews#256

Draft
nhorton wants to merge 16 commits intoclaude/remove-runner-arg-oc2v7from
job_yml_refactor
Draft

Refactor job.yml format: inline steps, step_arguments, DeepWork Reviews#256
nhorton wants to merge 16 commits intoclaude/remove-runner-arg-oc2v7from
job_yml_refactor

Conversation

@nhorton
Copy link
Copy Markdown
Contributor

@nhorton nhorton commented Mar 9, 2026

Summary

  • Unify job.yml around workflows with inline steps, replacing the old two-level format (root steps[] + workflows referencing step IDs)
  • Replace bespoke ClaudeCLI quality gate with DeepWork Reviews infrastructure (dynamic ReviewRules from job.yml + .deepreview rules)
  • Introduce step_arguments[] as shared data vocabulary (string/file_path) flowing between steps with optional review blocks and json_schema validation
  • Migrate all 3 standard jobs and 1 library job to new format with instructions inlined (delete all steps/*.md files)
  • Remove: version field, root-level steps[], dependencies, hooks, exposed/hidden, instructions_file, concurrent step groups, ClaudeCLI module, --external-runner flag

Test plan

  • 627 tests pass (1 skipped e2e, 1 xfailed unimplemented)
  • ruff clean, mypy clean
  • All 3 standard jobs + library job parse with new parser
  • Run a workflow end-to-end via MCP to verify runtime behavior
  • Verify .deepreview rules fire correctly on step outputs
  • Test sub_workflow invocation across jobs

🤖 Generated with Claude Code

@nhorton nhorton changed the base branch from main to claude/remove-runner-arg-oc2v7 March 10, 2026 03:01
nhorton and others added 3 commits March 27, 2026 16:14
…ws quality gate

Unify the job.yml format around workflows as the primary structure with inline
steps, replacing the old two-level format (root steps[] + workflows referencing
step IDs) and the bespoke ClaudeCLI quality gate.

Key changes:
- step_arguments[] define shared data vocabulary (string/file_path) flowing between steps
- workflows{} are objects with inline steps (no separate step .md files)
- Quality gate uses DeepWork Reviews infrastructure instead of Claude CLI subprocess
- review blocks on step_arguments/outputs create dynamic ReviewRules at runtime
- process_quality_attributes review work_summary against criteria
- json_schema on step_arguments validates file outputs before reviews
- notes → work_summary, current_entry_index → current_step_index
- Removed: version field, root-level steps[], dependencies, hooks, exposed/hidden,
  instructions_file, concurrent step groups, ClaudeCLI module, --external-runner flag

All 3 standard jobs and library job migrated with instructions inlined.
627 tests pass, ruff clean, mypy clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix status.py to use dict-based workflows API and WorkflowStep.name
- Fix entry_index -> step_index in test_state.py and test_status.py
- Fix ruff import ordering and formatting across rebased files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nhorton and others added 13 commits March 27, 2026 16:34
The release process pins the version in .mcp.json args (e.g.
"deepwork==0.10.0a1"), which broke the exact-match assertion.
Use startswith instead so the test passes for both unpinned
and version-pinned args.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rewrite CONTRIBUTING.md pre-release section to lead with uv tool install
  (what the MCP server actually uses via uvx), not uv pip install
- Add --platform claude to all serve command examples in CONTRIBUTING.md
  and doc/architecture.md to match the actual .mcp.json config
- Remove lowercase claude.md symlink (duplicate of CLAUDE.md, causes
  collisions on case-insensitive filesystems)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nt fixes

- Sync __init__.py version with pyproject.toml (0.7.0 → 0.9.8)
- Add __init__.py version update to prepare-release workflow
- Fix doc/mcp_interface.md: notes→work_summary, add inputs param,
  fix ActiveStepInfo/ExpectedOutput types, add post_workflow_instructions
- Fix doc/architecture.md: needs_review→needs_work, .mcp.json args
- Add engineer job to library/jobs/README.md
- Fix mypy type annotations in test_state.py and test_tools.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix MCP version test to allow release-pinned package names

The release process pins the version in .mcp.json args (e.g.
"deepwork==0.10.0a1"), which broke the exact-match assertion.
Use startswith instead so the test passes for both unpinned
and version-pinned args.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix: update marketplace.json version for pre-releases too

The marketplace version update was gated to stable-only, so
pre-release builds pushed to the pre-release branch had a stale
version in .claude-plugin/marketplace.json. This meant users
installing via /plugin marketplace add ...#pre-release got
mismatched versions.

Changes:
- Remove the is_prerelease == false guard on the marketplace update step
- Use plugin_version (semver) instead of pypi_version for consistency
  with plugin.json (no behavior change for stable where they're equal)
- Add marketplace.json to the pre-release git add

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update test for mcp_command requirement validation

* Clarify requirement in test_mcp_command_is_uvx_deepwork_serve

Updated comment to clarify requirement for test.

* Fix ruff formatting in test_claude_plugin.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 36 traceability comments (THIS TEST VALIDATES A HARD REQUIREMENT)
back to test_tools.py and test_quality_gate.py, mapping tests to their
JOBS-REQ-001.* requirement IDs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update JOBS-REQ-001/002/004 specs to match new job format and
  Reviews-based quality gate
- Deprecate JOBS-REQ-009 (claude_cli.py was deleted)
- Remove legacy instance_id from skill files and post_compact hook
- Update .github/workflows/README.md with __init__.py and
  marketplace.json version bumps, remove duplicate sections

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When MCP tool interfaces change (tools.py, schemas.py, server.py),
automatically review skill files, hooks, and platform content to
catch stale parameter names or outdated behavior descriptions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rewrite stale sections in doc/mcp_interface.md: output types, response
  fields, quality gate behavior, review types, CLI options, go_to_step
- Fix JOBS-REQ-001.1.13 → JOBS-REQ-001.1.8 in test_server.py
- Add 19 traceability comments to test_quality_gate.py for JOBS-REQ-004

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ents

- JOBS-REQ-001.5.8 → JOBS-REQ-001.5.4 (file_path list validation)
- JOBS-REQ-001.6.8 → JOBS-REQ-001.6.5, JOBS-REQ-001.6.6 (abort nested)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ith RFC 2119 semantics

Review instructions now use RFC 2119 keywords (MUST, SHOULD, MAY) to determine
pass/fail severity. MUST/SHALL violations cause failure; SHOULD/RECOMMENDED
violations fail only when easily achievable; other requirements produce feedback
without failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cleanup

- Update job.yml.template and job.yml.example to new format (step_arguments,
  workflows object, inline instructions, process_requirements with RFC 2119)
- Restore shared_jobs workflow with inlined sync_shared_jobs instructions
- Fix deprecated `deepwork install` reference to mention standard jobs
- Move inline import to top-level in quality_gate.py
- Remove unused `step` param from _collect_output_file_paths
- Expand .deepreview rules to cover doc/job_yml_guidance.md and library jobs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…irements

Convert engineer, platform_engineer, repo, and research jobs from old format
(version, root-level steps[], instructions_file, dependencies, old reviews)
to new format (step_arguments, workflows object, inline instructions,
process_requirements with RFC 2119 keywords, output-level review blocks).

Also fix stale CLAUDE.md section about standard job file structure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant