Skip to content

Commit 561e99c

Browse files
eendebakptclaude
andcommitted
Speed up matching of case-insensitive character sets
Handle IN_IGNORE, IN_UNI_IGNORE and IN_LOC_IGNORE in SRE(count) so that a repeated case-insensitive set (e.g. [a-z]+ with re.I) scans inline instead of falling back to the per-character match loop. About 2x faster. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent ee78d43 commit 561e99c

1 file changed

Lines changed: 23 additions & 0 deletions

File tree

Modules/_sre/sre_lib.h

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,29 @@ SRE(count)(SRE_STATE* state, const SRE_CODE* pattern, Py_ssize_t maxcount)
213213
ptr++;
214214
break;
215215

216+
case SRE_OP_IN_IGNORE:
217+
/* repeated set, case-insensitive (ascii) */
218+
TRACE(("|%p|%p|COUNT IN_IGNORE\n", pattern, ptr));
219+
while (ptr < end && SRE(charset)(state, pattern + 2,
220+
(SRE_CODE) sre_lower_ascii(*ptr)))
221+
ptr++;
222+
break;
223+
224+
case SRE_OP_IN_UNI_IGNORE:
225+
/* repeated set, case-insensitive (unicode) */
226+
TRACE(("|%p|%p|COUNT IN_UNI_IGNORE\n", pattern, ptr));
227+
while (ptr < end && SRE(charset)(state, pattern + 2,
228+
(SRE_CODE) sre_lower_unicode(*ptr)))
229+
ptr++;
230+
break;
231+
232+
case SRE_OP_IN_LOC_IGNORE:
233+
/* repeated set, case-insensitive (locale) */
234+
TRACE(("|%p|%p|COUNT IN_LOC_IGNORE\n", pattern, ptr));
235+
while (ptr < end && SRE(charset_loc_ignore)(state, pattern + 2, *ptr))
236+
ptr++;
237+
break;
238+
216239
case SRE_OP_ANY:
217240
/* repeated dot wildcard. */
218241
TRACE(("|%p|%p|COUNT ANY\n", pattern, ptr));

0 commit comments

Comments
 (0)