[Refactor] New rule parser to support AST by colinthebomb1 · Pull Request #104 · ISG-ICS/QueryBooster

colinthebomb1 · 2026-04-07T18:23:39Z

Summary

This PR introduces and stabilizes RuleParser v2 along with dedicated v2 test coverage.

Highlights

New RuleParserV2 that works with our AST Nodes.
Similar structure to original rule parser, but added step for substituing variables with corresponding nodes.
Aligns v2 variable semantics and node naming for clarity:
- element variables (<x>)
- set variables (<<y>>)
New comprehensive v2 tests in test_rule_parser_v2.py.

Testing

Ran tests/test_rule_parser_v2.py
Ran full test suite to confirm compatibility with existing behavior
All tests passing

Made-with: Cursor

Copilot

Pull request overview

This PR introduces RuleParserV2, a new rule parsing implementation that generates AST node structures instead of JSON, along with a comprehensive refactoring of variable node classes.

Changes:

New RuleParserV2 module that parses rules into AST structures with ElementVariableNode and SetVariableNode for rule variables
Renamed VarNode → ElementVariableNode and VarSetNode → SetVariableNode throughout the codebase for semantic clarity
Updated internal token naming from V/VL to EV/SV (ElementVariable/SetVariable) to align with new v2 semantics
Comprehensive test coverage (745 lines) for RuleParserV2 including parametrized tests over entire rule catalog
Code cleanup in test_rule_parser.py (import consolidation and formatting)

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
core/rule_parser_v2.py	New RuleParserV2 implementation with 6-step pipeline for rule parsing and AST generation
tests/test_rule_parser_v2.py	Comprehensive test suite covering extendToFullSQL, replaceVars, parsing, variable substitution, and rule catalog validation
core/ast/node.py	Renamed VarNode/VarSetNode to ElementVariableNode/SetVariableNode with updated docstrings
core/ast/init.py	Updated exports to use new class names
tests/test_rule_parser.py	Import consolidation and code cleanup (formatting and indentation fixes)
tests/test_ast.py	Updated imports to use new ElementVariableNode/SetVariableNode names
tests/ast_util.py	Updated imports and comments to reference new variable node class names
core/query_parser.py	Updated imports and TODO comment to reference new class names

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

HazelYuAhiru

Overall LGTM! Just with a small comment.

HazelYuAhiru · 2026-04-07T19:38:41Z

core/ast/node.py

-class VarNode(Node):
-    """VarSQL variable node"""
+class ElementVariableNode(Node):
+    """Rule element variable ``<name>`` (see ``VarType.ElementVariable`` in rule_parser_v2)."""


ElementVariableNode don't have their own __eq__ or __hash__, so they both inherit Node's implementation, which only compares type and children. The .name field is invisible to equality checks so something like ElementVariableNode("x") == ElementVariableNode("y") will return True

HazelYuAhiru · 2026-04-07T19:38:57Z

core/ast/node.py

-class VarSetNode(Node):
-    """VarSQL variable set node"""
+class SetVariableNode(Node):
+    """Rule set variable ``<<name>>`` (see ``VarType.SetVariable`` in rule_parser_v2)."""


Same as above.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-07T20:11:07Z

core/rule_parser_v2.py

+        if node.type == NodeType.FUNCTION:
+            f = node
+            if not isinstance(f, FunctionNode):
+                return node
+            new_args = [RuleParserV2._substitute_placeholders(a, rev) for a in f.children]
+            return FunctionNode(f.name, _args=new_args, _alias=f.alias)
+
+        if node.type == NodeType.LIST:
+            ln = node
+            if not isinstance(ln, ListNode):
+                return node
+            return ListNode([RuleParserV2._substitute_placeholders(c, rev) for c in ln.children])
+


RuleParserV2._substitute_placeholders currently substitutes tokens in ColumnNode/TableNode/aliases but does not handle other places where internal EV/SV tokens can legally appear (e.g., LIMIT/OFFSET values like LIMIT <x3> and literals like '%<y>%'). This can leave internal tokens in the resulting v2 AST, making the output inconsistent with the goal of returning an AST with external variable names/nodes. Consider extending substitution to cover LimitNode/OffsetNode (and any other node fields that can carry identifiers) and adding test assertions for these cases so token leakage is caught.

Copilot · 2026-04-07T20:11:07Z

core/query_parser.py

 from core.ast.node import (
    Node, QueryNode, SelectNode, FromNode, WhereNode, TableNode, ColumnNode, 
    LiteralNode, DataTypeNode, TimeUnitNode, IntervalNode,
    CaseNode, WhenThenNode,
    OperatorNode, UnaryOperatorNode, FunctionNode, GroupByNode, HavingNode,
    OrderByNode, OrderByItemNode, LimitNode, OffsetNode, SubqueryNode,
-    VarNode, VarSetNode, JoinNode, ListNode
+    ElementVariableNode, SetVariableNode, JoinNode, ListNode
 )
-# TODO: implement VarNode, VarSetNode
+# TODO: implement ElementVariableNode, SetVariableNode


core/query_parser.py imports ElementVariableNode and SetVariableNode but never references them (only mentioned in a TODO). If QueryParser isn’t responsible for emitting these node types yet, consider removing the unused imports (and adjusting the TODO) to avoid dead code / potential lint failures.

Copilot · 2026-04-07T20:11:07Z

tests/test_rule_parser_v2.py

+    Known limitation: ``_substitute_placeholders`` does not replace a ColumnNode's
+    name when the column is qualified (``parent_alias`` is set) but the parent alias
+    itself is a literal not present in the internal-to-external mapping.  We only flag
+    unqualified columns and qualified columns whose parent alias is also an internal
+    token.
+    """
+    internal_tokens = set(result.mapping.values())
+
+    for tree_label, tree in [("pattern", result.pattern_ast), ("rewrite", result.rewrite_ast)]:
+        for n in _walk(tree):
+            if isinstance(n, ColumnNode):
+                if n.parent_alias is None:
+                    assert not _TOKEN_RE.match(n.name), (
+                        f"{tree_label} AST has raw internal token {n.name!r} "
+                        f"as unqualified ColumnNode"
+                    )
+                elif n.parent_alias in internal_tokens:
+                    assert not _TOKEN_RE.match(n.parent_alias), (
+                        f"{tree_label} AST has raw internal token {n.parent_alias!r} "
+                        f"as ColumnNode.parent_alias"
+                    )
+                    assert not _TOKEN_RE.match(n.name), (
+                        f"{tree_label} AST has raw internal token {n.name!r} "
+                        f"as ColumnNode.name (parent_alias was also an internal token)"
+                    )
+                # else: parent_alias is a literal (e.g. "t") and name is an internal
+                # token — known gap in _substitute_placeholders; skip.


The docstring for _assert_no_internal_tokens describes a “known limitation” where _substitute_placeholders doesn’t replace ColumnNode.name for qualified columns when parent_alias is a literal. In the current RuleParserV2 implementation, the pa is not None and nm in rev branch does replace the name in that case, so this comment appears outdated/misleading. Please update the comment (or tighten the assertion) to reflect the actual substitution behavior so the test doesn’t mask real token leaks.

Suggested change

Known limitation: ``_substitute_placeholders`` does not replace a ColumnNode's

name when the column is qualified (``parent_alias`` is set) but the parent alias

itself is a literal not present in the internal-to-external mapping. We only flag

unqualified columns and qualified columns whose parent alias is also an internal

token.

"""

internal_tokens = set(result.mapping.values())

for tree_label, tree in [("pattern", result.pattern_ast), ("rewrite", result.rewrite_ast)]:

for n in _walk(tree):

if isinstance(n, ColumnNode):

if n.parent_alias is None:

assert not _TOKEN_RE.match(n.name), (

f"{tree_label} AST has raw internal token {n.name!r} "

f"as unqualified ColumnNode"

)

elif n.parent_alias in internal_tokens:

assert not _TOKEN_RE.match(n.parent_alias), (

f"{tree_label} AST has raw internal token {n.parent_alias!r} "

f"as ColumnNode.parent_alias"

)

assert not _TOKEN_RE.match(n.name), (

f"{tree_label} AST has raw internal token {n.name!r} "

f"as ColumnNode.name (parent_alias was also an internal token)"

)

# else: parent_alias is a literal (e.g. "t") and name is an internal

# token — known gap in _substitute_placeholders; skip.

This assertion checks identifier-bearing AST fields that should contain external

names after placeholder substitution, including both qualified and unqualified

``ColumnNode.name`` values.

"""

internal_tokens = set(result.mapping.values())

for tree_label, tree in [("pattern", result.pattern_ast), ("rewrite", result.rewrite_ast)]:

for n in _walk(tree):

if isinstance(n, ColumnNode):

assert not _TOKEN_RE.match(n.name), (

f"{tree_label} AST has raw internal token {n.name!r} as ColumnNode.name"

)

if n.parent_alias in internal_tokens:

assert not _TOKEN_RE.match(n.parent_alias), (

f"{tree_label} AST has raw internal token {n.parent_alias!r} "

f"as ColumnNode.parent_alias"

)

Copilot · 2026-04-07T20:11:08Z

core/ast/__init__.py

 from .node import (
    Node,
    TableNode,
    SubqueryNode,
    ColumnNode,
    LiteralNode,
-    VarNode,
-    VarSetNode,
+    ElementVariableNode,
+    SetVariableNode,
    OperatorNode,


core/ast/init.py lists 'OrderByItemNode' in all, but it is not imported from .node in this module. This makes from core.ast import OrderByItemNode fail at runtime despite being advertised as public API. Either import OrderByItemNode here or remove it from all to keep exports consistent.

colinthebomb1 added 6 commits April 7, 2026 11:26

Add RuleParser.parse_v2 returning project AST (RuleParseResult)

4920c99

Made-with: Cursor

seperate parsrr files

b628827

fix newlines

a4ea3e8

seperate VarNode/Varset substitution from reducing query scope

1705c2b

fix node names and add test cases

1a644fd

improve test cases

0fd8c98

colinthebomb1 force-pushed the colin/rule-parser-parse-v2 branch from bfc2e15 to 0fd8c98 Compare April 7, 2026 18:26

colinthebomb1 requested a review from Copilot April 7, 2026 18:31

Copilot started reviewing on behalf of colinthebomb1 April 7, 2026 18:32 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

colinthebomb1 requested a review from HazelYuAhiru April 7, 2026 19:01

HazelYuAhiru reviewed Apr 7, 2026

View reviewed changes

baiqiushi requested a review from Copilot April 7, 2026 20:03

Copilot started reviewing on behalf of baiqiushi April 7, 2026 20:03 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] New rule parser to support AST#104

[Refactor] New rule parser to support AST#104
colinthebomb1 wants to merge 6 commits intomainfrom
colin/rule-parser-parse-v2

colinthebomb1 commented Apr 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

HazelYuAhiru left a comment

Uh oh!

HazelYuAhiru Apr 7, 2026

Uh oh!

HazelYuAhiru Apr 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

colinthebomb1 commented Apr 7, 2026

Summary

Highlights

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

HazelYuAhiru left a comment

Choose a reason for hiding this comment

Uh oh!

HazelYuAhiru Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

HazelYuAhiru Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants