Skip to content

feat: regex key support for ctl:ruleRemoveTargetById and ctl:ruleRemoveTargetByTag#3526

Open
etiennemunnich wants to merge 6 commits intoowasp-modsecurity:v3/masterfrom
etiennemunnich:feature/ctl-regex-ruleRemoveTarget-byid-bytag
Open

feat: regex key support for ctl:ruleRemoveTargetById and ctl:ruleRemoveTargetByTag#3526
etiennemunnich wants to merge 6 commits intoowasp-modsecurity:v3/masterfrom
etiennemunnich:feature/ctl-regex-ruleRemoveTarget-byid-bytag

Conversation

@etiennemunnich
Copy link
Copy Markdown

Summary

Add regex pattern matching in the variable-key position of ctl:ruleRemoveTargetById and ctl:ruleRemoveTargetByTag.

This enables exclusion patterns like:

ctl:ruleRemoveTargetById=932125;ARGS:/^json\.\d+\.JobDescription$/
ctl:ruleRemoveTargetByTag=XSS;ARGS:/^json\.\d+\.JobDescription$/

Fixes #3505

Problem

JSON body processing generates argument names with unpredictable numeric indices (json.0.JobDescription, json.1.JobDescription, ...). Without regex key support, operators must either:

  1. Write one exclusion per possible array index (impractical)
  2. Exclude the entire ARGS collection from the rule (too broad, loses protection)
  3. Use path-based exclusion (loses all CRS protection for the endpoint)

This is a common pain point for anyone running CRS with JSON/GraphQL APIs.

Approach

Following your guidance in the issue discussion, the regex is compiled once at config load — never recompiled per request. This directly addresses the concern about the v2 PR #3121 where regex compilation ran on every exclusion check.

How it works

  1. Detection: In init(), the /pattern/ delimiter is detected in the target string (e.g. ARGS:/^json\.\d+\.Field$/)
  2. Compilation: The pattern is compiled via Utils::Regex (PCRE2 by default, PCRE1 with --with-pcre) at config load time
  3. Storage: Compiled regex is stored as shared_ptr<Utils::Regex> — shared across all requests, zero per-request allocation
  4. Matching: During rule evaluation, searchAll() runs against the short variable-key string (typically 10-40 chars)
  5. Backward compat: Literal targets (ARGS:password) continue to work unchanged via the existing == / case-insensitive comparison

Shared design for ById and ByTag

Both actions use a common RuleRemoveTargetSpec struct with matchesFullName() and matchesKeyWithCollection() methods, avoiding code duplication.

Lexer change

The scanner character class REMOVE_RULE_TARGET_VALUE (previously REMOVE_RULE_TARGET_BY_ID_VALUE, used only by ById) is now shared by both ById and ByTag. It includes regex metacharacters (^ $ + ( ) | ? \) but not comma — so chained ctl: actions still split correctly on ,.

Context: ModSecurity v2 and Coraza

Engine ById ByTag ByMsg Regex keys
ModSec v2 (v2.9.12)
ModSec v3 (upstream)
Coraza (post PR #1561)
This PR ✅ (ById + ByTag)

This PR adds regex key support to the two actions that v3 already has (ById, ByTag). The missing ctl:ruleRemoveTargetByMsg is a separate, larger discussion and is intentionally excluded.

Files Changed (11 files)

File Change
headers/modsecurity/rule_remove_target_entry.h New. RuleRemoveTargetSpec, ByIdEntry, ByTagEntry structs
headers/modsecurity/transaction.h ById + ByTag list types updated to use new entry structs
src/actions/ctl/rule_remove_target_by_id.cc Parse /pattern/, compile regex in init()
src/actions/ctl/rule_remove_target_by_id.h Add shared_ptr<Regex> member
src/actions/ctl/rule_remove_target_by_tag.cc Parse /pattern/, compile regex in init()
src/actions/ctl/rule_remove_target_by_tag.h Add shared_ptr<Regex> member
src/parser/seclang-scanner.ll REMOVE_RULE_TARGET_VALUE shared by ById + ByTag
src/rule_with_operator.cc Both match paths use target.matchesFullName() / matchesKeyWithCollection()
test/test-cases/regression/issue-3505.json 5 tests: ById regex, ByTag regex, ByTag literal compat
test/test-cases/regression/issue-3505-crs-ctl-byid-tag-msg.json 2 CRS-style tests with @detectSQLi + JSON body
test/test-suite.in Register new test files

Test Results

7 new tests, all passing:

Test Description
ById regex — JSON array keys ARGS:/^json\.\d+\.JobDescription$/ excludes dynamic JSON args
ById regex — suffix match ARGS:/mixpanel$/ excludes args by suffix pattern
ByTag regex — JSON array keys Same as above, matching rules by tag
ByTag regex — suffix match Same as above, matching rules by tag
ByTag literal — backward compat ARGS:password — proves literal targets still work unchanged
CRS-style ById + @detectSQLi Realistic JSON POST with SQL injection, excluded by regex key
CRS-style ByTag + @detectSQLi Same scenario, excluded by tag + regex key

Full regression suite: 5005 total, 4987 pass, 18 skip, 0 fail, 0 error.

Performance

Benchmark: 25,000 iterations × 5 trials, JSON POST with 20 ARGS keys, 3 detection rules (@detectSQLi, @detectXSS, @rx), 2 regex exclusions (one ById, one ByTag).

Scenario Median Per-request
Baseline (no exclusions) 1,209 ms 0.048 ms
With regex exclusions 1,326 ms 0.053 ms
Overhead +117 ms +0.005 ms/req (+9.7%)

Scaling with more ARGS keys:

Keys/request Baseline With regex Overhead
20 1,209 ms 1,326 ms +9.7%
50 2,620 ms 2,873 ms +9.7%
100 5,199 ms 5,543 ms +6.6%

The overhead scales linearly with ARGS count — no exponential blowup. At 100 keys (an extreme JSON body), the per-request cost is +0.014 ms. The cost is the searchAll() call on short variable-name strings against precompiled PCRE2 patterns.

Key design decisions keeping performance in check:

  • Regex compiled once at config load, stored as shared_ptr
  • searchAll() runs on short strings (variable names, typically 10-40 chars)
  • Literal targets use existing == comparison — no regression for non-regex users

Made with Cursor

- 2 test cases: JSON array keys, mixpanel suffix (SecRuleUpdateTargetById parity)
- Expected to fail until regex support is implemented
- OODA baseline: Test 1 parse error, Test 2 HTTP 403 (exclusion not applied)

Made-with: Cursor
- Compile regex at config load, not per-request
- RuleRemoveTargetByIdEntry struct: literal or shared_ptr<Regex>
- Test 2 (ARGS:/mixpanel$/) passes; Test 1 blocked by parser owasp-modsecurity#2927

Made-with: Cursor
…veTargetByTag

Add regex pattern matching in the variable-key position of
ctl:ruleRemoveTargetById and ctl:ruleRemoveTargetByTag, enabling
exclusions like:

  ctl:ruleRemoveTargetById=932125;ARGS:/^json\.\d+\.JobDescription$/
  ctl:ruleRemoveTargetByTag=XSS;ARGS:/^json\.\d+\.JobDescription$/

JSON body processing generates argument names with dynamic array
indices (json.0.Field, json.1.Field, ...). Without regex keys,
operators cannot scope exclusions to specific keys without listing
every possible index or disabling rules entirely.

Design:
- Regex detected by /pattern/ delimiter in COLLECTION:/pattern/
- Compiled once at config load via Utils::Regex (PCRE2/PCRE1)
- Stored as shared_ptr - zero per-request compilation
- Literal targets continue to work unchanged (no breaking change)
- Shared RuleRemoveTargetSpec struct used by both ById and ByTag
- Lexer REMOVE_RULE_TARGET_VALUE class shared by both actions

Aligns ModSecurity v3 with Coraza (corazawaf/coraza#1561).

Fixes owasp-modsecurity#3505
@etiennemunnich etiennemunnich force-pushed the feature/ctl-regex-ruleRemoveTarget-byid-bytag branch from 370f93b to 637ad9c Compare March 28, 2026 22:44
@etiennemunnich
Copy link
Copy Markdown
Author

Known limitation: {m,n} quantifiers in regex keys

After reviewing the discussion in PR #3121 between @airween, @marcstern, and @dune73, I want to flag one inherent limitation.

The comma inside {m,n} quantifiers breaks parsing. For example:

# This will NOT work — comma inside {2,5} terminates the token:
ctl:ruleRemoveTargetById=1;ARGS:/^field\d{2,5}$/

# Parser sees: ARGS:/^field\d{2   ← token ends at comma

This is the same constraint discussed at length in PR #3121 — the comma is the SecLang action separator, and it cannot be included in the lexer character class without breaking ctl:a=...,ctl:b=... chaining.

Workarounds are straightforward:

Instead of Use
\d{2,5} \d\d+ or \d+
[a-z]{3} [a-z][a-z][a-z] or [a-z]+
.{1,10} .+

The character class does include { and } (updated in latest push), so fixed-count quantifiers like \d{3} work fine — only the comma-containing {m,n} form is affected. This matches the same trade-off the v2 PR makes.

I believe this is acceptable for the target use case (JSON key patterns like ^json\.\d+\.FieldName$ and cookie name patterns like ^sess_[a-f0-9]+$), which don't need {m,n}.

@airween
Copy link
Copy Markdown
Member

airween commented Mar 29, 2026

Hi @etiennemunnich,

thanks for this great PR! Really excellent work, congrats!

There are some reports from SonarCloud, please take a look at them, but I think that would be enough to check only headers/modsecurity/rule_remove_target_entry.h (these 3 issues).

Regarding to your knowing {m, n} limitation: I see that pain, probably we should check the parser, maybe we can figure out something to resolve this.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds regex support for the variable-key portion of ctl:ruleRemoveTargetById and ctl:ruleRemoveTargetByTag, enabling scoped exclusions for dynamic JSON keys (e.g., json.0.Field, json.1.Field) without disabling entire collections.

Changes:

  • Introduces RuleRemoveTargetSpec / entry structs and updates transaction storage for rule-remove-target state.
  • Parses /pattern/ targets at config load time and stores a compiled regex for ById/ByTag.
  • Updates rule evaluation paths and adds regression + CRS-style tests for the new behavior.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
tools/rules-check/Makefile.am Adds include path to accommodate new header dependencies.
test/test-suite.in Registers new regression test files.
test/test-cases/regression/issue-3505.json Adds regression tests covering ById/ByTag regex and literal compatibility.
test/test-cases/regression/issue-3505-crs-ctl-byid-tag-msg.json Adds CRS-style JSON tests validating regex exclusions with SQLi detection.
test/benchmark/Makefile.am Adds include path to accommodate new header dependencies.
src/rule_with_operator.cc Switches exclusion checks to new matches*() helpers for ById/ByTag.
src/parser/seclang-scanner.ll Broadens lexer token class for remove-target values to allow regex metacharacters.
src/actions/ctl/rule_remove_target_by_tag.h Stores a precompiled regex pointer for ByTag action.
src/actions/ctl/rule_remove_target_by_tag.cc Detects /pattern/, compiles regex in init(), stores structured entry in transaction.
src/actions/ctl/rule_remove_target_by_id.h Stores a precompiled regex pointer for ById action.
src/actions/ctl/rule_remove_target_by_id.cc Detects /pattern/, compiles regex in init(), stores structured entry in transaction.
headers/modsecurity/transaction.h Updates transaction fields to new entry types and includes new header.
headers/modsecurity/rule_remove_target_entry.h New public header defining target spec + ById/ByTag entry structs and match helpers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@airween
Copy link
Copy Markdown
Member

airween commented Mar 29, 2026

@etiennemunnich,

also, unfortunately all Windows tests were failed, for eg this one.

Honestly, I can't see the root cause after a quick look, namely why the compiler does not find the pcre2.h...

Meantime Copilot also made a review, please take a look at them too. Maybe this explains the Windows build error.

- Move RuleRemoveTargetSpec matching into rule_remove_target_entry.cc so
  installed headers do not pull in src/utils/regex.h (unblocks Windows CI for
  targets that only include modsecurity headers).
- Forward-declare Utils::Regex; use regex->search() for boolean checks.
- Require collection prefix to match for regex targets; add regression test
  so ARGS:/.../ does not exclude REQUEST_HEADERS with the same key name.
- Use std::make_shared for compiled ctl regex (storage is shared_ptr).
- Install rule_remove_target_entry.h via pkginclude_HEADERS; remove
  -I$(top_srcdir) from benchmark and rules-check Makefiles.

Made-with: Cursor
- Document feat in CHANGES under v3.0.15 (TBD).
- Register issue-3505-crs-matrix.json in test-suite.in (scope-out matrix).
- Align Utils::Regex forward declaration with anchored_set_variable.h.

Made-with: Cursor
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +457 to +460
* to support COLLECTION:/regex/ patterns. Includes regex metacharacters (^ $ + ( ) | ? \)
* but NOT comma, so chained ctl: actions still split on ",". */
REMOVE_RULE_BY [0-9A-Za-z_\/\.\-\*\:\;\]\[\$]+
REMOVE_RULE_TARGET_VALUE [0-9A-Za-z_\/\.\-\*\:\;\]\[\$\^\+()|?\\{}]+
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REMOVE_RULE_TARGET_VALUE is a very restrictive character class for the COLLECTION:/regex/ payload (it excludes common PCRE tokens like <, >, =, !, , inside {m,n}, etc.). This means some valid regexes that work in other ModSecurity contexts (e.g. SecRuleUpdateTargetById which uses FREE_TEXT_SPACE_COMMA_QUOTE) won’t even lex/parse for ctl:ruleRemoveTargetBy{Id,Tag}. Consider broadening this token to accept any non-comma, non-whitespace, non-quote characters (consistent with other directive/action parsing), while still excluding , so chained ctl: actions continue to split correctly.

Suggested change
* to support COLLECTION:/regex/ patterns. Includes regex metacharacters (^ $ + ( ) | ? \)
* but NOT comma, so chained ctl: actions still split on ",". */
REMOVE_RULE_BY [0-9A-Za-z_\/\.\-\*\:\;\]\[\$]+
REMOVE_RULE_TARGET_VALUE [0-9A-Za-z_\/\.\-\*\:\;\]\[\$\^\+()|?\\{}]+
* to support COLLECTION:/regex/ patterns. Accepts any non-comma, non-whitespace,
* non-double-quote characters (consistent with FREE_TEXT_SPACE_COMMA_QUOTE), but NOT comma,
* so chained ctl: actions still split on ",". */
REMOVE_RULE_BY [0-9A-Za-z_\/\.\-\*\:\;\]\[\$]+
REMOVE_RULE_TARGET_VALUE [^, \t\"\n\r]+

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed allowlist covers the regex metacharacters needed for the target use cases (^, $, +, (), |, ?, , {}).

A negated class like [^, \t\"\n\r]+ would be more permissive and align with FREE_TEXT_SPACE_COMMA_QUOTE, but I'd worry it could accept tokens the parser isn't ready for.

The allowlist approach is conservative and should be safe — it can't accidentally let through characters that break parser state. If broader support is needed later, extending the allowlist is straightforward. The trade-off is to catch unsupported characters at lex time (allowlist) vs. catch them at init() regex compilation time.

Happy to go either way here — would value your input on which approach you'd prefer.

@airween
Copy link
Copy Markdown
Member

airween commented Mar 30, 2026

@etiennemunnich,

there are some new Copilot notes, please take a look at them.

- Reject empty regex at config load (ARGS:// → error, not silent no-op);
  add parser_error regression tests for ById and ByTag.
- Fast-fail when a collection-scoped regex target is matched against a
  scalar variable (no colon in candidate).
- Remove redundant src/utils/regex.h include from ctl action .cc files
  (already provided by their .h; avoids PCRE2_CODE_UNIT_WIDTH redefinition).

Made-with: Cursor
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Wildcard/pattern support in ctl:ruleRemoveTargetById (v3)

3 participants