drupal-core-bot

High-fidelity voice profiles of 66 Drupal core contributors, built from their public drupal.org issue queue records. Query any contributor's likely position on a topic, simulate patch reviews, map debates, find reviewers, and explore how the project's priorities have evolved over nearly two decades.

Disclaimer: All profiles are derived from public contributions. Not authored or endorsed by any contributor.

Quick Start

git clone https://github.com/AlexU-A/catch-bot.git
cd catch-bot

# Install skills for any contributor
claude skill add ./skills/catch-query       # "What would catch say about X?"
claude skill add ./skills/catch-review      # "Review this patch like catch would"
claude skill add ./skills/catch-compare     # "How would catch and alexpott differ?"
claude skill add ./skills/core-team-review  # Multi-author review (auto-selects)

# Or use the CLI tools directly
python3 scripts/find-reviewers.py --subsystem caching
python3 scripts/map-debate.py --topic deprecation_approach
python3 scripts/consensus-check.py --all

What's Here

66 Contributors Profiled

From tier-1 core committers (catch, alexpott, webchick, xjm) to subsystem specialists (Fabianx, Wim Leers, berdir) to the AI module leads heading into Drupal 12 (marcus_johansson, kristen pol, scott_euser). Each author has:

Positions on up to 40 topics with evidence quotes and drupal.org citations
Terminology — distinctive vocabulary patterns
Chronology — how their views evolved over time
Voice contexts — per-register communication style
Cross-author analysis — expertise scores, review routing, philosophy alignment

Quality tiers: 50 gold (2K+ comments), 12 silver (500+), 3 bronze (200+), 1 insufficient.

141 Installable Skills

Per-author skills ({author}-query, {author}-review, {author}-compare) plus core-team-review for multi-author simulation.

12 Analysis Tools

Tool	What it does
`validate-profile.py`	Profile completeness scoring (0-100) for any/all authors
`corpus-freshness.py`	Staleness detection with re-scrape recommendations
`map-debate.py`	Debate landscape: who agrees, who opposes, fault lines
`consensus-check.py`	Consensus/contested status with blocker identification
`expertise-gaps.py`	Subsystem coverage heatmap (17 subsystems)
`find-reviewers.py`	Recommend reviewers by subsystem, topic, or diff
`check-precedent.py`	Search corpus for prior debates with evidence URLs
`simulate-review.py`	Predict reviewer feedback (optional Claude synthesis)
`self-audit.py`	Compare your text against a contributor's voice profile
`position-timeline.py`	Position evolution over time (terminal + HTML)
`generate-contributor-card.py`	One-page HTML profile card with SVG radar chart
`compare-eras.py`	Cross-era analysis: D7 through D11 shifts

Plus export-author.py for standalone single-author packages.

Example: Who Should Review My Caching Patch?

$ python3 scripts/find-reviewers.py --subsystem caching

Reviewers for subsystem: Caching (caching)

  1. Nathaniel Catchpole (catch) — [##########] 5/5 tier 1
  2. Wim Leers (Wim Leers) — [######....] 3/5 tier 2
  3. Alex Pott (alexpott) — [######....] 3/5 tier 1
  4. Sascha Grossenbacher (berdir) — [####......] 2/5 tier 2
  ...

Example: Is There Consensus on Deprecation Policy?

$ python3 scripts/consensus-check.py --topic deprecation_approach

Topic: Deprecation approach and tooling (deprecation_approach)
Status: CONTESTED
  17 positive | 5 pragmatic | 6 negative
  Consensus ratio: 61% (17/28)

  Likely blockers: (tier 1-2 committers with strong opposing views)
    larowlan (strength 5/5, tier 1): negative stance
    longwave (strength 5/5, tier 1): negative stance

Example: What's Been Said About Hook-to-Event Migration?

$ python3 scripts/check-precedent.py --query "hook event migration"

Found 49 author positions across 13 topics, 245 evidence citations

  Hook-to-event migration — 8 authors
    effulgentsia (5/5): Long-standing engagement with the hooks-to-modern-patterns...
    Wim Leers (5/5): Reviews hook-to-event patches for proper integration...
    dawehner (5/5): Favors replacing procedural hooks with Symfony events...

Corpus

Source	Scale
Authors profiled	66 (59 with full extraction)
Core issue queue comments (catch)	53,102
Cross-author philosophy pairs	1,711
Subsystems mapped	17
Topics with positions	40
Contrib projects scraped	AI, AI Agents, Canvas/XB, ECA, JSON:API, Migrate, + more
Git commits analyzed	68K+ (review routing)

Every corpus entry tracks: issue_id, comment_id, url, date. All data from drupal.org public REST API.

Methodology

Adapted from the joyus-ai content profile system.

Scrape — Collect all public comments from drupal.org issue queues per author
Extract — Positions (40 topics), terminology, chronological evolution
Synthesize — Claude-powered position summaries grounded in evidence quotes
Cross-reference — Expertise map (git commits), review routing, philosophy graph
Validate — Fidelity gates (attribution F1=0.935, adversarial 5/5 PASS)

Fidelity Results (catch — most complete profile)

Gate	Result
Go/No-Go	PASS (F1=0.958, AUC=0.931)
Attribution	PASS (F1=0.935, 98.9% precision)
Cross-author	PASS
Generation	PASS (z-dist=1.49)
Register	FAIL (67.4%) — FINDING: catch writes uniformly across registers
Adversarial	PASS 5/5 — LLM skepticism reproduced faithfully

The catch LLM Skepticism Guardrail

catch is vocally skeptical of LLM-generated code contributions (issues #3580761, #3574093). His specific critiques:

LLM code review gave confident wrong answers about entity caching (his own subsystem)
One 5,000-line LLM MR consumed 5 core committers' review time
agents.md in core would encourage more LLM usage: "This should be a policy issue for humans"

This skepticism is faithfully reproduced in all profile outputs. A catch-style response that softens his AI position fails the fidelity test. The profile can articulate exactly why catch would object to this project.

Export a Single Author

# Create a standalone package for any author
python3 scripts/export-author.py --author catch
python3 scripts/export-author.py --author catch --tar  # As tarball

# See who's exportable
python3 scripts/export-author.py --list

The export includes profile, extraction, skills, cross-author slices, README, and CLAUDE.md — everything needed to use the profile independently.

Repo Structure

catch-bot/
├── corpus/
│   ├── authors/{name}/          # Per-author: issue-queue/, checkpoint.json, contrib/
│   ├── shared/                  # Shared issue metadata, Canvas/XB, contrib projects
│   ├── issue-queue/             # Legacy catch-only corpus
│   ├── git-log/                 # 14K catch commits
│   └── change-records/          # 137 architectural decisions
├── profiles/{author}/           # voice-contexts.json, stylometrics-train.json
├── extraction/{author}/         # positions.json, terminology.json, chronology.json
├── analysis/cross-author/       # expertise-map, review-routing, philosophy-graph
├── config/                      # authors.json, topic-taxonomy.json, contrib-projects.json
├── scripts/                     # Pipeline + 12 analysis tools + shared libraries
├── skills/                      # 141 installable Claude Code skills
└── docs/                        # Methodology notes, critic reviews

Limitations & Future Directions

What's Missing: Drupal Slack

The most significant gap in the corpus is Drupal Slack, where a substantial portion of core development discussion happens informally. Architecture debates, review coordination, mentoring conversations, and real-time decision-making all occur in Slack channels that are not publicly accessible or archivable through standard APIs.

This means the profiles are built exclusively from drupal.org issue queue comments, git commits, change records, and blog posts. Contributor perspectives expressed only in Slack — including informal positions on contested topics, mentoring guidance, and real-time review discussions — are not captured. For some contributors whose primary engagement is through Slack rather than issue queues, the profiles may significantly underrepresent their actual influence and positions.

If Drupal Slack data were accessible (through community-sanctioned exports or a public archive), it would meaningfully improve:

Position accuracy for contributors who are more active in Slack than issue queues
Debate coverage for topics discussed primarily in real-time channels
Relationship mapping between contributors who coordinate via Slack
Mentoring patterns that are largely invisible in the issue queue record

Search Infrastructure

The current pipeline can route queries to relevant experts ("who should review my caching patch?") but cannot perform full-text search across the 853MB corpus to answer questions like "what does contributor X specifically think about topic Y?" A search-indexed version of the corpus (e.g., via Solr or Elasticsearch) would enable direct evidence retrieval and more precise contributor-specific queries.

Ethics

Purpose: Research and analysis to understand Drupal's design philosophy. Not impersonation.
Disclosure: Every generated output includes: "Based on {author}'s public issue queue record. Not authored by {display_name}."
Data: Only publicly available drupal.org data. No private communications.
Framing: Skills use "{author}-informed" framing, not first-person impersonation.
Opt-out: If any contributor requests removal, their data is deleted immediately via scripts/remove-author.py.

License

Code, scripts, and profile structures are original work. The corpus (issue queue comments, commit messages) is not redistributed — only derived analysis artifacts are included. All drupal.org content remains subject to drupal.org's terms.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
analysis		analysis
config		config
corpus		corpus
docs		docs
extraction		extraction
profiles		profiles
results		results
scripts		scripts
skills		skills
tests		tests
viz/data		viz/data
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
index.html		index.html
install-skills.sh		install-skills.sh
retrain_phase1.py		retrain_phase1.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

drupal-core-bot

Quick Start

What's Here

66 Contributors Profiled

141 Installable Skills

12 Analysis Tools

Example: Who Should Review My Caching Patch?

Example: Is There Consensus on Deprecation Policy?

Example: What's Been Said About Hook-to-Event Migration?

Corpus

Methodology

Fidelity Results (catch — most complete profile)

The catch LLM Skepticism Guardrail

Export a Single Author

Repo Structure

Limitations & Future Directions

What's Missing: Drupal Slack

Search Infrastructure

Ethics

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

drupal-core-bot

Quick Start

What's Here

66 Contributors Profiled

141 Installable Skills

12 Analysis Tools

Example: Who Should Review My Caching Patch?

Example: Is There Consensus on Deprecation Policy?

Example: What's Been Said About Hook-to-Event Migration?

Corpus

Methodology

Fidelity Results (catch — most complete profile)

The catch LLM Skepticism Guardrail

Export a Single Author

Repo Structure

Limitations & Future Directions

What's Missing: Drupal Slack

Search Infrastructure

Ethics

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages