Skip to content

YAML highlighter: value-leading ---/... scoped as document markers #23

@johnsoncodehk

Description

@johnsoncodehk

The YAML TextMate highlighter is flat (derived from the token stream), so it retries the same top-level patterns at every token boundary. #docstart / #docend sit in that top-level loop, and their regexes have no line-start anchor:

"docstart": { "match": "---(?=[\\t ]|\\r|\\n|$)", "name": "entity.other.document.begin.yaml" },
"docend":   { "match": "\\.\\.\\.(?=[\\t ]|\\r|\\n|$)", "name": "entity.other.document.end.yaml" }

So a --- / ... that opens a value (or a sequence-item value) is scoped as a document marker, even though a YAML directives-end / document-end marker is only meaningful at column 0.

Repro — all three are valid YAML (yaml package reports 0 errors)

input yaml oracle Monogram official RedCMD
note: --- not a marker {note: "--- not a marker"} ---entity.other.document.begin.yaml string.unquoted.plain.out.yaml
x: ... bar {x: "... bar"} ...entity.other.document.end.yaml string ✓
- --- x ["--- x"] ---entity.other.document.begin.yaml string ✓

A --- in the middle of a value (a: b --- c{a: "b --- c"}) is already correct — the leading plain scalar consumes it. Only a value-leading --- / ... is mis-scoped.

Root cause

In yaml.ts the markers are tokens with no positional condition:

const DocStart = token(seq('---', docMarkerEnd), { scope: 'entity.other.document.begin' });
const DocEnd   = token(seq('...', docMarkerEnd), { scope: 'entity.other.document.end' });

The parser constrains them structurally — DocStart / DocEnd are only accepted in Stream / document positions — so the CST is correct. But gen-tm emits the token pattern into every context's pattern list, dropping that structural constraint. official avoids it by placing the (equally unanchored) marker pattern only inside #document, which is included once at stream top-level; a value goes through #block-node's plain rules, which don't include the marker.

This is the flat-highlighter analogue of a position constraint the parser gets for free from grammar structure.

Fix direction

Constrain the document markers to line start (column 0). A YAML --- / ... marker is only valid at the start of a line, so anchoring #docstart / #docend (or gating the DocStart / DocEnd tokens on line-start in gen-tm) should be safe.

Acceptance

  • The three repros scope --- / ... as string content, not document markers.
  • Legitimate line-start --- / ... document separators still scope as entity.other.document.*.
  • node src/cli.ts yaml.ts keeps the parser CST and the other six grammars byte-identical; scope-gap:yaml does not regress.

Related: #24

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions