feat: regex key support for ctl:ruleRemoveTargetById and ctl:ruleRemoveTargetByTag#3526
Conversation
- 2 test cases: JSON array keys, mixpanel suffix (SecRuleUpdateTargetById parity) - Expected to fail until regex support is implemented - OODA baseline: Test 1 parse error, Test 2 HTTP 403 (exclusion not applied) Made-with: Cursor
- Compile regex at config load, not per-request - RuleRemoveTargetByIdEntry struct: literal or shared_ptr<Regex> - Test 2 (ARGS:/mixpanel$/) passes; Test 1 blocked by parser owasp-modsecurity#2927 Made-with: Cursor
…veTargetByTag Add regex pattern matching in the variable-key position of ctl:ruleRemoveTargetById and ctl:ruleRemoveTargetByTag, enabling exclusions like: ctl:ruleRemoveTargetById=932125;ARGS:/^json\.\d+\.JobDescription$/ ctl:ruleRemoveTargetByTag=XSS;ARGS:/^json\.\d+\.JobDescription$/ JSON body processing generates argument names with dynamic array indices (json.0.Field, json.1.Field, ...). Without regex keys, operators cannot scope exclusions to specific keys without listing every possible index or disabling rules entirely. Design: - Regex detected by /pattern/ delimiter in COLLECTION:/pattern/ - Compiled once at config load via Utils::Regex (PCRE2/PCRE1) - Stored as shared_ptr - zero per-request compilation - Literal targets continue to work unchanged (no breaking change) - Shared RuleRemoveTargetSpec struct used by both ById and ByTag - Lexer REMOVE_RULE_TARGET_VALUE class shared by both actions Aligns ModSecurity v3 with Coraza (corazawaf/coraza#1561). Fixes owasp-modsecurity#3505
370f93b to
637ad9c
Compare
Known limitation:
|
| Instead of | Use |
|---|---|
\d{2,5} |
\d\d+ or \d+ |
[a-z]{3} |
[a-z][a-z][a-z] or [a-z]+ |
.{1,10} |
.+ |
The character class does include { and } (updated in latest push), so fixed-count quantifiers like \d{3} work fine — only the comma-containing {m,n} form is affected. This matches the same trade-off the v2 PR makes.
I believe this is acceptable for the target use case (JSON key patterns like ^json\.\d+\.FieldName$ and cookie name patterns like ^sess_[a-f0-9]+$), which don't need {m,n}.
|
Hi @etiennemunnich, thanks for this great PR! Really excellent work, congrats! There are some reports from SonarCloud, please take a look at them, but I think that would be enough to check only Regarding to your knowing |
There was a problem hiding this comment.
Pull request overview
Adds regex support for the variable-key portion of ctl:ruleRemoveTargetById and ctl:ruleRemoveTargetByTag, enabling scoped exclusions for dynamic JSON keys (e.g., json.0.Field, json.1.Field) without disabling entire collections.
Changes:
- Introduces
RuleRemoveTargetSpec/ entry structs and updates transaction storage for rule-remove-target state. - Parses
/pattern/targets at config load time and stores a compiled regex for ById/ByTag. - Updates rule evaluation paths and adds regression + CRS-style tests for the new behavior.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/rules-check/Makefile.am | Adds include path to accommodate new header dependencies. |
| test/test-suite.in | Registers new regression test files. |
| test/test-cases/regression/issue-3505.json | Adds regression tests covering ById/ByTag regex and literal compatibility. |
| test/test-cases/regression/issue-3505-crs-ctl-byid-tag-msg.json | Adds CRS-style JSON tests validating regex exclusions with SQLi detection. |
| test/benchmark/Makefile.am | Adds include path to accommodate new header dependencies. |
| src/rule_with_operator.cc | Switches exclusion checks to new matches*() helpers for ById/ByTag. |
| src/parser/seclang-scanner.ll | Broadens lexer token class for remove-target values to allow regex metacharacters. |
| src/actions/ctl/rule_remove_target_by_tag.h | Stores a precompiled regex pointer for ByTag action. |
| src/actions/ctl/rule_remove_target_by_tag.cc | Detects /pattern/, compiles regex in init(), stores structured entry in transaction. |
| src/actions/ctl/rule_remove_target_by_id.h | Stores a precompiled regex pointer for ById action. |
| src/actions/ctl/rule_remove_target_by_id.cc | Detects /pattern/, compiles regex in init(), stores structured entry in transaction. |
| headers/modsecurity/transaction.h | Updates transaction fields to new entry types and includes new header. |
| headers/modsecurity/rule_remove_target_entry.h | New public header defining target spec + ById/ByTag entry structs and match helpers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Move RuleRemoveTargetSpec matching into rule_remove_target_entry.cc so installed headers do not pull in src/utils/regex.h (unblocks Windows CI for targets that only include modsecurity headers). - Forward-declare Utils::Regex; use regex->search() for boolean checks. - Require collection prefix to match for regex targets; add regression test so ARGS:/.../ does not exclude REQUEST_HEADERS with the same key name. - Use std::make_shared for compiled ctl regex (storage is shared_ptr). - Install rule_remove_target_entry.h via pkginclude_HEADERS; remove -I$(top_srcdir) from benchmark and rules-check Makefiles. Made-with: Cursor
- Document feat in CHANGES under v3.0.15 (TBD). - Register issue-3505-crs-matrix.json in test-suite.in (scope-out matrix). - Align Utils::Regex forward declaration with anchored_set_variable.h. Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * to support COLLECTION:/regex/ patterns. Includes regex metacharacters (^ $ + ( ) | ? \) | ||
| * but NOT comma, so chained ctl: actions still split on ",". */ | ||
| REMOVE_RULE_BY [0-9A-Za-z_\/\.\-\*\:\;\]\[\$]+ | ||
| REMOVE_RULE_TARGET_VALUE [0-9A-Za-z_\/\.\-\*\:\;\]\[\$\^\+()|?\\{}]+ |
There was a problem hiding this comment.
REMOVE_RULE_TARGET_VALUE is a very restrictive character class for the COLLECTION:/regex/ payload (it excludes common PCRE tokens like <, >, =, !, , inside {m,n}, etc.). This means some valid regexes that work in other ModSecurity contexts (e.g. SecRuleUpdateTargetById which uses FREE_TEXT_SPACE_COMMA_QUOTE) won’t even lex/parse for ctl:ruleRemoveTargetBy{Id,Tag}. Consider broadening this token to accept any non-comma, non-whitespace, non-quote characters (consistent with other directive/action parsing), while still excluding , so chained ctl: actions continue to split correctly.
| * to support COLLECTION:/regex/ patterns. Includes regex metacharacters (^ $ + ( ) | ? \) | |
| * but NOT comma, so chained ctl: actions still split on ",". */ | |
| REMOVE_RULE_BY [0-9A-Za-z_\/\.\-\*\:\;\]\[\$]+ | |
| REMOVE_RULE_TARGET_VALUE [0-9A-Za-z_\/\.\-\*\:\;\]\[\$\^\+()|?\\{}]+ | |
| * to support COLLECTION:/regex/ patterns. Accepts any non-comma, non-whitespace, | |
| * non-double-quote characters (consistent with FREE_TEXT_SPACE_COMMA_QUOTE), but NOT comma, | |
| * so chained ctl: actions still split on ",". */ | |
| REMOVE_RULE_BY [0-9A-Za-z_\/\.\-\*\:\;\]\[\$]+ | |
| REMOVE_RULE_TARGET_VALUE [^, \t\"\n\r]+ |
There was a problem hiding this comment.
The proposed allowlist covers the regex metacharacters needed for the target use cases (^, $, +, (), |, ?, , {}).
A negated class like [^, \t\"\n\r]+ would be more permissive and align with FREE_TEXT_SPACE_COMMA_QUOTE, but I'd worry it could accept tokens the parser isn't ready for.
The allowlist approach is conservative and should be safe — it can't accidentally let through characters that break parser state. If broader support is needed later, extending the allowlist is straightforward. The trade-off is to catch unsupported characters at lex time (allowlist) vs. catch them at init() regex compilation time.
Happy to go either way here — would value your input on which approach you'd prefer.
|
there are some new Copilot notes, please take a look at them. |
- Reject empty regex at config load (ARGS:// → error, not silent no-op); add parser_error regression tests for ById and ByTag. - Fast-fail when a collection-scoped regex target is matched against a scalar variable (no colon in candidate). - Remove redundant src/utils/regex.h include from ctl action .cc files (already provided by their .h; avoids PCRE2_CODE_UNIT_WIDTH redefinition). Made-with: Cursor
|



Summary
Add regex pattern matching in the variable-key position of
ctl:ruleRemoveTargetByIdandctl:ruleRemoveTargetByTag.This enables exclusion patterns like:
ctl:ruleRemoveTargetById=932125;ARGS:/^json\.\d+\.JobDescription$/ ctl:ruleRemoveTargetByTag=XSS;ARGS:/^json\.\d+\.JobDescription$/Fixes #3505
Problem
JSON body processing generates argument names with unpredictable numeric indices (
json.0.JobDescription,json.1.JobDescription, ...). Without regex key support, operators must either:This is a common pain point for anyone running CRS with JSON/GraphQL APIs.
Approach
Following your guidance in the issue discussion, the regex is compiled once at config load — never recompiled per request. This directly addresses the concern about the v2 PR #3121 where regex compilation ran on every exclusion check.
How it works
init(), the/pattern/delimiter is detected in the target string (e.g.ARGS:/^json\.\d+\.Field$/)Utils::Regex(PCRE2 by default, PCRE1 with--with-pcre) at config load timeshared_ptr<Utils::Regex>— shared across all requests, zero per-request allocationsearchAll()runs against the short variable-key string (typically 10-40 chars)ARGS:password) continue to work unchanged via the existing==/ case-insensitive comparisonShared design for ById and ByTag
Both actions use a common
RuleRemoveTargetSpecstruct withmatchesFullName()andmatchesKeyWithCollection()methods, avoiding code duplication.Lexer change
The scanner character class
REMOVE_RULE_TARGET_VALUE(previouslyREMOVE_RULE_TARGET_BY_ID_VALUE, used only by ById) is now shared by both ById and ByTag. It includes regex metacharacters (^ $ + ( ) | ? \) but not comma — so chainedctl:actions still split correctly on,.Context: ModSecurity v2 and Coraza
This PR adds regex key support to the two actions that v3 already has (ById, ByTag). The missing
ctl:ruleRemoveTargetByMsgis a separate, larger discussion and is intentionally excluded.Files Changed (11 files)
headers/modsecurity/rule_remove_target_entry.hRuleRemoveTargetSpec,ByIdEntry,ByTagEntrystructsheaders/modsecurity/transaction.hsrc/actions/ctl/rule_remove_target_by_id.cc/pattern/, compile regex ininit()src/actions/ctl/rule_remove_target_by_id.hshared_ptr<Regex>membersrc/actions/ctl/rule_remove_target_by_tag.cc/pattern/, compile regex ininit()src/actions/ctl/rule_remove_target_by_tag.hshared_ptr<Regex>membersrc/parser/seclang-scanner.llREMOVE_RULE_TARGET_VALUEshared by ById + ByTagsrc/rule_with_operator.cctarget.matchesFullName()/matchesKeyWithCollection()test/test-cases/regression/issue-3505.jsontest/test-cases/regression/issue-3505-crs-ctl-byid-tag-msg.json@detectSQLi+ JSON bodytest/test-suite.inTest Results
7 new tests, all passing:
ARGS:/^json\.\d+\.JobDescription$/excludes dynamic JSON argsARGS:/mixpanel$/excludes args by suffix patternARGS:password— proves literal targets still work unchanged@detectSQLi@detectSQLiFull regression suite: 5005 total, 4987 pass, 18 skip, 0 fail, 0 error.
Performance
Benchmark: 25,000 iterations × 5 trials, JSON POST with 20 ARGS keys, 3 detection rules (
@detectSQLi,@detectXSS,@rx), 2 regex exclusions (one ById, one ByTag).Scaling with more ARGS keys:
The overhead scales linearly with ARGS count — no exponential blowup. At 100 keys (an extreme JSON body), the per-request cost is +0.014 ms. The cost is the
searchAll()call on short variable-name strings against precompiled PCRE2 patterns.Key design decisions keeping performance in check:
shared_ptrsearchAll()runs on short strings (variable names, typically 10-40 chars)==comparison — no regression for non-regex usersMade with Cursor