Skip to content

Generative testing: grammar-derived inputs + by-construction consistency (with test-suite cleanup) #25

@johnsoncodehk

Description

@johnsoncodehk

Motivation

#23 and #24 are valid-input highlighter bugs the scope-gap metric reported 0 of — not because the metric is broken (fed the two inputs directly, grading catches #23 and the differential catches #24), but because the corpus (yaml-test-suite 351 + issue12 10 = 416) contains no shape that triggers them. The metric is corpus-bound: "100% / monogramWrong 0" only means "no error on the 416 inputs it has seen". Hand-adding shapes — how #23/#24 were found — is sampling; it can't prove no other shape is missed (Dijkstra: tests show presence, not absence).

But Monogram has a lever a normal highlighter doesn't: the source IS a grammar. The same yaml.ts / typescript.ts object the parser, highlighter and tree-sitter derive from is also a generator — walk it (alt=branch, seq=concat, many/opt=repeat, token=sample) and it emits guaranteed-legal inputs. That replaces "hope the corpus contains the shape" with systematic coverage.

Part 1 — the method

(a) Grammar-based input generation. One generic generator over the shared combinator IR (all 7 languages import the same src/api.ts combinators → one driver covers every language). Two modes:

  • Bounded-exhaustive enumeration — every derivation of depth ≤ N. Provably complete for depth ≤ N under the small-scope hypothesis; turns "which shapes get tested" from imagination into grammar × bound. (E.g. - - a\n - b\n- c, the YAML highlighter: nested compact sequence item swallowed by plain-scalar fold #24 shape, is just SeqItem's value being another BlockSequence + many(Newline, SeqItem) — a depth-3 derivation the enumerator produces without anyone thinking of it.)
  • Grammar fuzzing — random production choices for deeper/larger structures.

The only per-language cost is structural-token materialization: token-stream languages (TS/JS/TSX/JSX) join tokens directly; markup (HTML/Vue) needs a tag stack + raw-text body; YAML needs an indent stack mirroring the lexer to turn Indent/Dedent/Newline into real indentation. This hook has the same shape as gen-lexer's per-language config — it stays agnostic. (Ambiguity — JSX-vs-generic, regex-vs-divide, ASI — is a parsing problem; when generating you pick a derivation, so it never arises. The generator only honours the context combinators already in the grammar: sameLine, regexContext, not(), exclude.)

(b) Judging. Generated inputs feed:

  • Round-trip generate(structure) → parse → recover structure — needs NO external oracle; validates parser + tree-sitter internal consistency for all 7 languages.
  • Highlighter-scope ≡ parser-per-token-role — the parser (with its full indent machine) is the ground truth the flat highlighter must match; YAML highlighter: value-leading ---/... scoped as document markers #23/YAML highlighter: nested compact sequence item swallowed by plain-scalar fold #24 are exactly this inconsistency, so this check auto-surfaces the whole class without anyone naming the shape.
  • Neutral oracle (yaml package / tsc / parse5) for highlighter correctness where internal consistency isn't enough.
  • Differential vs official grammars (already in scope-gap), now on generated inputs instead of a fixed corpus.

Part 2 — the test-suite cleanup it enables (to land alongside)

The suite is 80 files / ~11k lines. Most is essential matrix — 7 languages × 4 targets (parser / TextMate / Monarch / tree-sitter) × several metrics — and is not compressible without dropping coverage. Adopting the generator lets us retire/reshape three slices:

A. Delete dev-only scratch / superseded probes (~500 lines, ~9 files; confirm each isn't a CI gate first):
parser-gap.ts (254 — superseded by run-conformance + src-coverage-ts, reads an external /tmp path), yaml-diag.ts / yaml-poc.ts (self-labeled "Throwaway"), diag.ts / prof.ts / ts-ast.ts / bench-vs-ts.ts / bench-vs-ts-agg.ts / classify-ts.ts (dev diagnostics).

B. Per-language adapters → data-driven (~400–550 lines, 13 files → ~4): the thin scope-gap-{ts,js,jsx,tsx,html,yaml,vue} and src-coverage-{6} adapters are mostly boilerplate (corpus path, grammar path, scopeName, oracle pick). Collapse to one driver + a config table (the oracle files are already separate), invoked as scope-gap <lang>. This is the same agnostic/data-driven principle the engines already follow — pushing the harness from ~80% to 100% agnostic. Caveat: the html / yaml adapters are thicker and may not fully fold in; keep per-language entry points via a parameter.

C. Corpus loaders → generator source. The corpus role in scope-gap / src-coverage becomes a generator. The hand-written *-issue-cases and yaml-issue12-regressions stay as named regressions — generation replaces their discovery function, not their documentation/gate value.

Honest boundary — what generation does NOT replace

  • Alignment to external references (*-conformance vs tsc, src-coverage's official-branch metric, scope-gap's vs-official differential, repo-compat): round-trip only proves the derivation chain is self-consistent — never that the parser matches tsc's semantic boundary. These stay, unchanged.
  • Negative / error tests (verify-rejects, the reject half of conformance): generation emits only legal inputs; rejecting illegal ones needs a separate mutation axis.
  • Smoke / bench / report (*-smoke, perf-bench, issue-table, coverage-table, agnostic): structural / perf / README — untouched.

Net: not fewer lines — coverage upgraded from corpus sampling to bounded-exhaustive, the consistency harnesses reused as the spine, scratch deleted, and per-language boilerplate folded into one data-driven driver. What's actually replaced is the corpus's "representativeness bet" (no longer betting yaml-test-suite happens to contain a #23/#24 shape) and the ongoing labour of hand-writing new cases to chase coverage.

Acceptance (incremental)

Related: #23, #24 (the bugs that motivated this).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions