Generative testing: grammar-derived inputs + by-construction consistency (with test-suite cleanup)

## Motivation

#23 and #24 are valid-input highlighter bugs the `scope-gap` metric reported **0** of — not because the metric is broken (fed the two inputs directly, grading catches #23 and the differential catches #24), but because the corpus (yaml-test-suite 351 + issue12 10 = 416) contains no shape that triggers them. The metric is **corpus-bound**: "100% / monogramWrong 0" only means "no error on the 416 inputs it has seen". Hand-adding shapes — how #23/#24 were found — is sampling; it can't prove no other shape is missed (Dijkstra: tests show presence, not absence).

But Monogram has a lever a normal highlighter doesn't: **the source IS a grammar**. The same `yaml.ts` / `typescript.ts` object the parser, highlighter and tree-sitter derive from is also a generator — walk it (`alt`=branch, `seq`=concat, `many`/`opt`=repeat, token=sample) and it emits guaranteed-legal inputs. That replaces "hope the corpus contains the shape" with systematic coverage.

## Part 1 — the method

**(a) Grammar-based input generation.** One generic generator over the shared combinator IR (all 7 languages import the same `src/api.ts` combinators → one driver covers every language). Two modes:

- **Bounded-exhaustive enumeration** — every derivation of depth ≤ N. Provably complete for depth ≤ N under the small-scope hypothesis; turns "which shapes get tested" from imagination into `grammar × bound`. (E.g. `- - a\n  - b\n- c`, the #24 shape, is just `SeqItem`'s value being another `BlockSequence` + `many(Newline, SeqItem)` — a depth-3 derivation the enumerator produces without anyone thinking of it.)
- **Grammar fuzzing** — random production choices for deeper/larger structures.

The only per-language cost is **structural-token materialization**: token-stream languages (TS/JS/TSX/JSX) join tokens directly; markup (HTML/Vue) needs a tag stack + raw-text body; YAML needs an indent stack mirroring the lexer to turn `Indent`/`Dedent`/`Newline` into real indentation. This hook has the same shape as gen-lexer's per-language config — it stays agnostic. (Ambiguity — JSX-vs-generic, regex-vs-divide, ASI — is a *parsing* problem; when generating you pick a derivation, so it never arises. The generator only honours the context combinators already in the grammar: `sameLine`, `regexContext`, `not()`, `exclude`.)

**(b) Judging.** Generated inputs feed:

- **Round-trip** `generate(structure) → parse → recover structure` — needs NO external oracle; validates parser + tree-sitter internal consistency for all 7 languages.
- **Highlighter-scope ≡ parser-per-token-role** — the parser (with its full indent machine) is the ground truth the flat highlighter must match; #23/#24 are exactly this inconsistency, so this check auto-surfaces the whole class without anyone naming the shape.
- **Neutral oracle** (`yaml` package / tsc / parse5) for highlighter correctness where internal consistency isn't enough.
- **Differential vs official grammars** (already in `scope-gap`), now on generated inputs instead of a fixed corpus.

## Part 2 — the test-suite cleanup it enables (to land alongside)

The suite is **80 files / ~11k lines**. Most is essential matrix — 7 languages × 4 targets (parser / TextMate / Monarch / tree-sitter) × several metrics — and is *not* compressible without dropping coverage. Adopting the generator lets us retire/reshape three slices:

**A. Delete dev-only scratch / superseded probes** (~500 lines, ~9 files; confirm each isn't a CI gate first):
`parser-gap.ts` (254 — superseded by run-conformance + src-coverage-ts, reads an external `/tmp` path), `yaml-diag.ts` / `yaml-poc.ts` (self-labeled "Throwaway"), `diag.ts` / `prof.ts` / `ts-ast.ts` / `bench-vs-ts.ts` / `bench-vs-ts-agg.ts` / `classify-ts.ts` (dev diagnostics).

**B. Per-language adapters → data-driven** (~400–550 lines, 13 files → ~4): the thin `scope-gap-{ts,js,jsx,tsx,html,yaml,vue}` and `src-coverage-{6}` adapters are mostly boilerplate (corpus path, grammar path, scopeName, oracle pick). Collapse to one driver + a config table (the oracle files are already separate), invoked as `scope-gap <lang>`. This is the same agnostic/data-driven principle the engines already follow — pushing the harness from ~80% to 100% agnostic. Caveat: the `html` / `yaml` adapters are thicker and may not fully fold in; keep per-language entry points via a parameter.

**C. Corpus loaders → generator source.** The corpus role in `scope-gap` / `src-coverage` becomes a generator. The hand-written `*-issue-cases` and `yaml-issue12-regressions` **stay as named regressions** — generation replaces their *discovery* function, not their documentation/gate value.

## Honest boundary — what generation does NOT replace

- **Alignment to external references** (`*-conformance` vs tsc, `src-coverage`'s official-branch metric, `scope-gap`'s vs-official differential, `repo-compat`): round-trip only proves the derivation chain is *self*-consistent — never that the parser matches tsc's semantic boundary. These stay, unchanged.
- **Negative / error tests** (`verify-rejects`, the reject half of conformance): generation emits only legal inputs; rejecting illegal ones needs a separate *mutation* axis.
- **Smoke / bench / report** (`*-smoke`, `perf-bench`, `issue-table`, `coverage-table`, `agnostic`): structural / perf / README — untouched.

Net: not fewer lines — coverage upgraded from corpus *sampling* to bounded-exhaustive, the consistency harnesses reused as the spine, scratch deleted, and per-language boilerplate folded into one data-driven driver. What's actually replaced is the corpus's "representativeness bet" (no longer betting yaml-test-suite happens to contain a #23/#24 shape) and the ongoing labour of hand-writing new cases to chase coverage.

## Acceptance (incremental)

- [ ] A generic grammar walker emitting legal inputs for any Monogram grammar (bounded-exhaustive to depth N) + the per-language structural-token materialization hook.
- [ ] Round-trip gate: generated inputs parse for all 7 languages.
- [ ] Highlighter-scope ≡ parser-role consistency check on generated inputs (must independently re-surface #23 and #24).
- [ ] A-layer cleanup landed (scratch / superseded deleted, each confirmed not a gate).
- [ ] B-layer: `scope-gap` / `src-coverage` adapters folded into a data-driven driver, per-language entry preserved as a parameter.

Related: #23, #24 (the bugs that motivated this).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generative testing: grammar-derived inputs + by-construction consistency (with test-suite cleanup) #25

Motivation

Part 1 — the method

Part 2 — the test-suite cleanup it enables (to land alongside)

Honest boundary — what generation does NOT replace

Acceptance (incremental)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Generative testing: grammar-derived inputs + by-construction consistency (with test-suite cleanup) #25

Description

Motivation

Part 1 — the method

Part 2 — the test-suite cleanup it enables (to land alongside)

Honest boundary — what generation does NOT replace

Acceptance (incremental)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions