Skip to content

zivtech/drupal-core-bot

Repository files navigation

drupal-core-bot

High-fidelity voice profiles of 66 Drupal core contributors, built from their public drupal.org issue queue records. Query any contributor's likely position on a topic, simulate patch reviews, map debates, find reviewers, and explore how the project's priorities have evolved over nearly two decades.

Disclaimer: All profiles are derived from public contributions. Not authored or endorsed by any contributor.


Quick Start

git clone https://github.com/AlexU-A/catch-bot.git
cd catch-bot

# Install skills for any contributor
claude skill add ./skills/catch-query       # "What would catch say about X?"
claude skill add ./skills/catch-review      # "Review this patch like catch would"
claude skill add ./skills/catch-compare     # "How would catch and alexpott differ?"
claude skill add ./skills/core-team-review  # Multi-author review (auto-selects)

# Or use the CLI tools directly
python3 scripts/find-reviewers.py --subsystem caching
python3 scripts/map-debate.py --topic deprecation_approach
python3 scripts/consensus-check.py --all

What's Here

66 Contributors Profiled

From tier-1 core committers (catch, alexpott, webchick, xjm) to subsystem specialists (Fabianx, Wim Leers, berdir) to the AI module leads heading into Drupal 12 (marcus_johansson, kristen pol, scott_euser). Each author has:

  • Positions on up to 40 topics with evidence quotes and drupal.org citations
  • Terminology — distinctive vocabulary patterns
  • Chronology — how their views evolved over time
  • Voice contexts — per-register communication style
  • Cross-author analysis — expertise scores, review routing, philosophy alignment

Quality tiers: 50 gold (2K+ comments), 12 silver (500+), 3 bronze (200+), 1 insufficient.

141 Installable Skills

Per-author skills ({author}-query, {author}-review, {author}-compare) plus core-team-review for multi-author simulation.

12 Analysis Tools

Tool What it does
validate-profile.py Profile completeness scoring (0-100) for any/all authors
corpus-freshness.py Staleness detection with re-scrape recommendations
map-debate.py Debate landscape: who agrees, who opposes, fault lines
consensus-check.py Consensus/contested status with blocker identification
expertise-gaps.py Subsystem coverage heatmap (17 subsystems)
find-reviewers.py Recommend reviewers by subsystem, topic, or diff
check-precedent.py Search corpus for prior debates with evidence URLs
simulate-review.py Predict reviewer feedback (optional Claude synthesis)
self-audit.py Compare your text against a contributor's voice profile
position-timeline.py Position evolution over time (terminal + HTML)
generate-contributor-card.py One-page HTML profile card with SVG radar chart
compare-eras.py Cross-era analysis: D7 through D11 shifts

Plus export-author.py for standalone single-author packages.


Example: Who Should Review My Caching Patch?

$ python3 scripts/find-reviewers.py --subsystem caching

Reviewers for subsystem: Caching (caching)

  1. Nathaniel Catchpole (catch) — [##########] 5/5 tier 1
  2. Wim Leers (Wim Leers) — [######....] 3/5 tier 2
  3. Alex Pott (alexpott) — [######....] 3/5 tier 1
  4. Sascha Grossenbacher (berdir) — [####......] 2/5 tier 2
  ...

Example: Is There Consensus on Deprecation Policy?

$ python3 scripts/consensus-check.py --topic deprecation_approach

Topic: Deprecation approach and tooling (deprecation_approach)
Status: CONTESTED
  17 positive | 5 pragmatic | 6 negative
  Consensus ratio: 61% (17/28)

  Likely blockers: (tier 1-2 committers with strong opposing views)
    larowlan (strength 5/5, tier 1): negative stance
    longwave (strength 5/5, tier 1): negative stance

Example: What's Been Said About Hook-to-Event Migration?

$ python3 scripts/check-precedent.py --query "hook event migration"

Found 49 author positions across 13 topics, 245 evidence citations

  Hook-to-event migration — 8 authors
    effulgentsia (5/5): Long-standing engagement with the hooks-to-modern-patterns...
    Wim Leers (5/5): Reviews hook-to-event patches for proper integration...
    dawehner (5/5): Favors replacing procedural hooks with Symfony events...

Corpus

Source Scale
Authors profiled 66 (59 with full extraction)
Core issue queue comments (catch) 53,102
Cross-author philosophy pairs 1,711
Subsystems mapped 17
Topics with positions 40
Contrib projects scraped AI, AI Agents, Canvas/XB, ECA, JSON:API, Migrate, + more
Git commits analyzed 68K+ (review routing)

Every corpus entry tracks: issue_id, comment_id, url, date. All data from drupal.org public REST API.


Methodology

Adapted from the joyus-ai content profile system.

  1. Scrape — Collect all public comments from drupal.org issue queues per author
  2. Extract — Positions (40 topics), terminology, chronological evolution
  3. Synthesize — Claude-powered position summaries grounded in evidence quotes
  4. Cross-reference — Expertise map (git commits), review routing, philosophy graph
  5. Validate — Fidelity gates (attribution F1=0.935, adversarial 5/5 PASS)

Fidelity Results (catch — most complete profile)

Gate Result
Go/No-Go PASS (F1=0.958, AUC=0.931)
Attribution PASS (F1=0.935, 98.9% precision)
Cross-author PASS
Generation PASS (z-dist=1.49)
Register FAIL (67.4%) — FINDING: catch writes uniformly across registers
Adversarial PASS 5/5 — LLM skepticism reproduced faithfully

The catch LLM Skepticism Guardrail

catch is vocally skeptical of LLM-generated code contributions (issues #3580761, #3574093). His specific critiques:

  • LLM code review gave confident wrong answers about entity caching (his own subsystem)
  • One 5,000-line LLM MR consumed 5 core committers' review time
  • agents.md in core would encourage more LLM usage: "This should be a policy issue for humans"

This skepticism is faithfully reproduced in all profile outputs. A catch-style response that softens his AI position fails the fidelity test. The profile can articulate exactly why catch would object to this project.


Export a Single Author

# Create a standalone package for any author
python3 scripts/export-author.py --author catch
python3 scripts/export-author.py --author catch --tar  # As tarball

# See who's exportable
python3 scripts/export-author.py --list

The export includes profile, extraction, skills, cross-author slices, README, and CLAUDE.md — everything needed to use the profile independently.


Repo Structure

catch-bot/
├── corpus/
│   ├── authors/{name}/          # Per-author: issue-queue/, checkpoint.json, contrib/
│   ├── shared/                  # Shared issue metadata, Canvas/XB, contrib projects
│   ├── issue-queue/             # Legacy catch-only corpus
│   ├── git-log/                 # 14K catch commits
│   └── change-records/          # 137 architectural decisions
├── profiles/{author}/           # voice-contexts.json, stylometrics-train.json
├── extraction/{author}/         # positions.json, terminology.json, chronology.json
├── analysis/cross-author/       # expertise-map, review-routing, philosophy-graph
├── config/                      # authors.json, topic-taxonomy.json, contrib-projects.json
├── scripts/                     # Pipeline + 12 analysis tools + shared libraries
├── skills/                      # 141 installable Claude Code skills
└── docs/                        # Methodology notes, critic reviews

Limitations & Future Directions

What's Missing: Drupal Slack

The most significant gap in the corpus is Drupal Slack, where a substantial portion of core development discussion happens informally. Architecture debates, review coordination, mentoring conversations, and real-time decision-making all occur in Slack channels that are not publicly accessible or archivable through standard APIs.

This means the profiles are built exclusively from drupal.org issue queue comments, git commits, change records, and blog posts. Contributor perspectives expressed only in Slack — including informal positions on contested topics, mentoring guidance, and real-time review discussions — are not captured. For some contributors whose primary engagement is through Slack rather than issue queues, the profiles may significantly underrepresent their actual influence and positions.

If Drupal Slack data were accessible (through community-sanctioned exports or a public archive), it would meaningfully improve:

  • Position accuracy for contributors who are more active in Slack than issue queues
  • Debate coverage for topics discussed primarily in real-time channels
  • Relationship mapping between contributors who coordinate via Slack
  • Mentoring patterns that are largely invisible in the issue queue record

Search Infrastructure

The current pipeline can route queries to relevant experts ("who should review my caching patch?") but cannot perform full-text search across the 853MB corpus to answer questions like "what does contributor X specifically think about topic Y?" A search-indexed version of the corpus (e.g., via Solr or Elasticsearch) would enable direct evidence retrieval and more precise contributor-specific queries.


Ethics

  1. Purpose: Research and analysis to understand Drupal's design philosophy. Not impersonation.
  2. Disclosure: Every generated output includes: "Based on {author}'s public issue queue record. Not authored by {display_name}."
  3. Data: Only publicly available drupal.org data. No private communications.
  4. Framing: Skills use "{author}-informed" framing, not first-person impersonation.
  5. Opt-out: If any contributor requests removal, their data is deleted immediately via scripts/remove-author.py.

License

Code, scripts, and profile structures are original work. The corpus (issue queue comments, commit messages) is not redistributed — only derived analysis artifacts are included. All drupal.org content remains subject to drupal.org's terms.

About

High-fidelity voice profiles of 66 Drupal core contributors — analysis tools, cross-author maps, and installable Claude Code skills

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors