DeepWork — Make AI Agents Trustworthy at Complex Tasks

AI agents are powerful, but they're unreliable. They go off-script, skip steps, hallucinate details, and don't check their own work. The more complex the task, the worse it gets.

DeepWork fixes this with two systems: Workflows that force agents to follow a structured process step by step, and Reviews that automatically verify every change against your rules. Together, they make agents trustworthy enough to run autonomously on real work.

Install

Claude Code (Terminal)

/plugin marketplace add https://github.com/Unsupervisedcom/deepwork
/plugin install deepwork@deepwork-plugins

Then start a new session and define your first job:

/deepwork Make a job for doing competitive research. It will take the URL of the competitor as an input, and should make report including a SWOT analysis for that competitor.

Claude Desktop

Enter Cowork mode (toggle at top of screen)
Select Customize with plugins at the bottom of the page.
Select Personal, click the +, and select Add marketplace from GitHub'
Set the URL to Unsupervisedcom/deepwork and press Sync. (adding a marketplace currently fails on Windows)
Select the deepwork plugin and click Install.
In Cowork mode, select 'Start a deepwork workflow'

Note: DeepWork stores job definitions in .deepwork/jobs/ and creates work branches in Git. Your project folder should be a Git repository. If it isn't, run git init first, or ask Claude to do so.

The Problem

When you ask an agent to do something simple — write a function, answer a question — it works great. But when you ask it to execute a multi-step process — research a topic across multiple sources, audit a codebase, produce a structured report — things fall apart:

It skips steps or invents shortcuts
It forgets context between steps
It doesn't check its work against your standards
It produces inconsistent results every time you run it

You end up babysitting the agent, which defeats the purpose of automation.

How DeepWork Solves This

DeepWork gives you two complementary systems that work together to make agents trustworthy:

Workflows: Structured Execution with Quality Gates

Workflows force the agent to follow a strict, step-by-step process. Each step has defined inputs, outputs, and quality checks. The agent can't skip ahead or go off-script — it must complete each step and pass its quality gates before moving on.

You define workflows by describing what you want in plain English. DeepWork asks you questions to refine the process, then generates a hardened multi-step workflow that the agent follows consistently every time.

/deepwork write a tutorial for how to use a new feature we just launched

DeepWork asks you questions (~10 minutes), then writes the steps. After that, you can run it whenever:

/deepwork tutorial_writer

The agent follows the workflow step by step, every time, the same way. You build the skill once and reuse it forever.

Reviews: Automated Verification of Every Change

Reviews are the second layer of trust. You define review policies in .deepreview config files — what files to watch, what to check for, how to group them — and they run automatically against every change.

/review

One command. Every rule you've defined runs in parallel, each review agent scoped to exactly the files and instructions it needs.

Reviews catch what workflows can't: style regressions, documentation falling out of sync, security issues in code the workflow didn't directly touch, requirements that lost their test coverage.

Together: Trust Through Process + Verification

Workflows ensure the agent follows the right process. Reviews ensure the output meets your standards. The combination is what makes the difference:

Without DeepWork	With DeepWork
Agent skips steps or invents shortcuts	Workflow enforces every step in order
No quality checks on output	Quality gates block progress until standards are met
Inconsistent results every run	Same process, same structure, every time
Changes break things silently	Review rules catch regressions automatically
You babysit the agent	You review the finished work

Quick Start

1. Install the Plugin

In Claude Code:

/plugin marketplace add https://github.com/Unsupervisedcom/deepwork
/plugin install deepwork@deepwork-plugins

Start a new Claude Code session after installing.

Note: If your folder isn't a Git repo yet, run git init first.

2. Define Your First Workflow

Start simple — something you do manually in 15-30 minutes:

/deepwork write a tutorial for how to use a new feature we just launched

DeepWork asks you questions (~10 minutes) then writes the steps. You're creating a reusable skill — you only do this once.

3. Run It

/deepwork tutorial_writer

Claude follows the workflow step by step.

4. Set Up Reviews

/configure_reviews

DeepWork analyzes your project and creates .deepreview files with review rules tailored to your codebase. From then on, /review checks every change.

Example: Competitive Research Workflow

You can make a DeepWork job that automatically runs deep competitive research:

Run /deepwork in Claude Code
Explain your process — e.g. "Go look at my company's website and social channels to capture any new developments, look at our existing list of competitors, do broad web searches to identify any new competitors, and then do deep research on each competitor and produce a comprehensive set of md reports including a summary report for me."

DeepWork asks you questions to improve the plan and makes a hardened automation workflow (~10 minutes). Then it documents how Claude should execute each step (~2-5 minutes).

After that, you can run the workflow at any time. Output looks something like:

# On your work branch (deepwork/competitive_research-acme-2026-02-21):
competitive_research/
├── competitors.md           # Who they are, what they do
├── competitor_profiles/     # Deep dive on each
├── primary_research.md      # What they say about themselves
└── strategic_overview.md    # Your positioning recommendations

All skills are composable. You can call skills inside of other jobs. For example, create a make_comparison_document skill and call it at the end of /competitive_research — automating the process of going from research to having final briefings for your sales team.

What People Are Building

Workflow	What It Does	The Trust Problem It Solves
Email triage	Scan inbox, categorize, archive, draft replies	Agent must follow consistent categorization rules, not just guess
Competitive research	Track competitors weekly, generate diff reports	Multi-source research needs structured steps to avoid missing sources
Tutorial writer	Turn expertise into docs	Quality gates ensure docs actually cover what they should
SaaS user audit	Quarterly audit of service access	Must check every service systematically, not skip ones it finds boring

One user used DeepWork to automatically research email performance across hundreds of millions of marketing emails. It accessed data from a data warehouse, came up with research questions, queried to answer those questions, and produced a comprehensive report. The process ran autonomously for ~90 minutes and produced a report better than internal dashboards that had been refined for months.

Who This Is For

If you can describe a process, you can automate it.

You	What You'd Automate
Founders	Competitive research, bug discovery and ticket creation, investor updates pulling from multiple sources
Ops	Daily briefs, SEO analysis, git summaries
Product Managers	Tutorial writing, QA reports, simulated user testing, updating process docs or sales materials
Engineers	Standup summaries, code review, release notes
Data/Analytics	Multi-source data pulls, ETL, custom reports and dashboards

DeepWork is a free, open-source tool — if you're already paying for a Claude Max subscription, each of these automations costs you nothing additional.

Similar to how vibe coding makes it easier for anyone to produce software, this is vibe automation: describe what you want, let it run, and then iterate on what works.

DeepWork Reviews — Deep Dive

Reviews are .deepreview config files placed anywhere in your project, scoped to the directory they live in (like .gitignore). When you run a review, it diffs your branch, matches changed files against your rules, and dispatches parallel AI review agents.

Why This Is Powerful

Define rules once, they enforce forever. A .deepreview file is simple YAML — a few lines to describe what files to match, how to group them, and what the reviewer should look for. No more "we forgot to check for X."

Reviews aren't limited to code. Documentation tone, prompt quality, config consistency, CI/CD security — anything that can be expressed as "look at these files and check for this."

Teams own their own rules. The security team puts a .deepreview in src/auth/, the platform team in infrastructure/, the docs team in docs/. Each file is independent and scoped to its directory.

Smart file grouping. Three review strategies control what each reviewer sees:

individual — one review per file (best for per-file linting and style checks)
matches_together — all matched files reviewed as a group (best for cross-file consistency)
all_changed_files — a tripwire: if any sensitive file changes, the reviewer sees every changed file in the branch (best for security audits)

Pass caching. When a review passes, it's marked as clean. It won't re-run until one of its reviewed files actually changes — so reviews stay fast even as your rule set grows.

Quick Example

# .deepreview
python_best_practices:
  description: "Review Python files for code quality."
  match:
    include: ["**/*.py"]
    exclude: ["tests/**"]
  review:
    strategy: individual
    instructions: |
      Check for proper error handling, consistent naming,
      and no hardcoded secrets.

prompt_best_practices:
  description: "Review prompt and instruction files for quality."
  match:
    include:
      - "**/CLAUDE.md"
      - "**/AGENTS.md"
      - ".claude/**/*.md"
  review:
    strategy: individual
    instructions: |
      Review this prompt/instruction file for clarity, specificity,
      and adherence to prompt engineering best practices. Flag vague
      instructions, missing examples, and conflicting directives.

requirements_traceability:
  description: "Verify requirements have automated tests."
  match:
    include: ["**/*"]
  review:
    strategy: all_changed_files
    instructions: |
      This project keeps formal requirements in specs/.
      Verify that every requirement has a corresponding
      automated test or review rule that enforces it.
      Flag any requirement missing traceability.

security_audit:
  description: "Full security review when auth code changes."
  match:
    include: ["src/auth/**/*.py"]
  review:
    strategy: all_changed_files
    agent:
      claude: "security-expert"
    instructions: |
      Auth code changed. Review ALL changed files for
      security regressions and leaked credentials.

What People Use Reviews For

Review Rule	What It Catches
Language-specific code review	Style violations, anti-patterns, bugs — per your project's conventions
Documentation freshness	Docs that fall out of sync when the code they describe changes
Prompt & instruction review	Prompt files that don't follow best practices
Migration safety	Conflicting database migrations, destructive operations
Version sync	Version numbers that drift across `pyproject.toml`, `CHANGELOG.md`, lock files
CI/CD audit	Unpinned GitHub Actions, leaked secrets, missing approval gates
Cross-file consistency	API schema changes without matching client updates
Requirements traceability	Requirements without corresponding automated tests or enforcement

See README_REVIEWS.md for the full reference — strategies, additional context flags, glob patterns, agent personas, and more examples.

Supported Platforms

Platform	Status	Notes
Claude Code	Full Support	Recommended. Plugin-based delivery with quality hooks.
Gemini CLI	Partial Support	TOML format skill, manual setup
OpenCode	Planned
GitHub Copilot CLI	Planned
Others	Planned	We are nailing Claude and Gemini first, then adding others according to demand

Tip: Use the terminal (Claude Code CLI), not the VS Code extension. The terminal has full feature support.

Browser Automation

For workflows that need to interact with websites, you can use any browser automation tool that works in Claude Code. We generally recommend Claude in Chrome.

Warning: Browser automation is still something models can be hit-or-miss on. We recommend using a dedicated Chrome profile for automation.

Troubleshooting

Here are some known issues that affect some early users — we're working on improving normal performance on these, but here are some known workarounds.

Stop hooks firing unexpectedly

Occasionally, especially after updating a job or running the deepwork_jobs learn process after completing a task, Claude will get confused about which workflow it's running checks for. For now, if stop hooks fire when they shouldn't, you can either:

Ask claude do we need to address any of these stop hooks or can we ignore them for now?
Ignore the stop hooks and keep going until the workflow steps are complete
Run the /clear command to start a new context window (you'll have to re-run the job after this)

Claude "just does the task" instead of using DeepWork

If Claude attempts to bypass the workflow and do the task on its own, tell it explicitly to use the skill. You can also manually run the step command:

/deepwork your_job

Tip: Don't say things like "can you do X" while defining a new job — Claude has a bias towards action and workarounds and may abandon the skill creation workflow and attempt to do your task as a one off. Instead, say something like "create a workflow that..."

If you can't solve your issues using the above and need help

Send @tylerwillis a message on X.

Advanced: Directory Structure

your-project/
├── .deepwork/
│   ├── tmp/               # Session state (created lazily)
│   └── jobs/              # Job definitions
│       └── job_name/
│           ├── job.yml    # Job metadata
│           └── steps/     # Step instructions

Advanced: Nix Flakes (for contributors)

# Development environment
nix develop

# Run without installing
nix run github:Unsupervisedcom/deepwork -- --help

License

Business Source License 1.1 (BSL 1.1). Free for non-competing use. Converts to Apache 2.0 on January 14, 2030.

See LICENSE.md for details.

Feedback

We're iterating fast. Open an issue or reach out on Twitter @tylerwillis.

DeepWork is in active development. Expect rough edges — and rapid improvement.

_{Inspired by GitHub's spec-kit}

Name		Name	Last commit message	Last commit date
Latest commit History 192 Commits
.claude-plugin		.claude-plugin
.claude		.claude
.deepwork		.deepwork
.gemini/skills/deepwork		.gemini/skills/deepwork
.github		.github
CLA/version_1		CLA/version_1
doc		doc
learning_agents		learning_agents
library/jobs		library/jobs
platform		platform
plugins		plugins
specs		specs
src/deepwork		src/deepwork
tests		tests
.deepreview		.deepreview
.envrc		.envrc
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CLAUDE_PLUGINS_README.md		CLAUDE_PLUGINS_README.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
README_REVIEWS.md		README_REVIEWS.md
claude.md		claude.md
coverage_report.md		coverage_report.md
flake.lock		flake.lock
flake.nix		flake.nix
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

DeepWork — Make AI Agents Trustworthy at Complex Tasks

Install

Claude Code (Terminal)

Claude Desktop

The Problem

How DeepWork Solves This

Workflows: Structured Execution with Quality Gates

Reviews: Automated Verification of Every Change

Together: Trust Through Process + Verification

Quick Start

1. Install the Plugin

2. Define Your First Workflow

3. Run It

4. Set Up Reviews

Example: Competitive Research Workflow

What People Are Building

Who This Is For

DeepWork Reviews — Deep Dive

Why This Is Powerful

Quick Example

What People Use Reviews For

Supported Platforms

Browser Automation

Troubleshooting

Stop hooks firing unexpectedly

Claude "just does the task" instead of using DeepWork

If you can't solve your issues using the above and need help

License

Feedback

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 20

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages