Skip to content

Optimize more regex cases with IndexOfAny#125586

Draft
stephentoub wants to merge 1 commit intodotnet:mainfrom
stephentoub:moreindexof
Draft

Optimize more regex cases with IndexOfAny#125586
stephentoub wants to merge 1 commit intodotnet:mainfrom
stephentoub:moreindexof

Conversation

@stephentoub
Copy link
Member

Extend both the source generator (RegexGenerator.Emitter.cs) and RegexCompiler to vectorize regex loops for complex character classes that previously fell through to scalar character-by-character matching. This primarily benefits the extremely common \w+, \d+, \s+ patterns (and their negations), as well as Unicode category sets and sets with subtraction.

What changed

TryEmitIndexOf (source generator) — now always succeeds for set families by falling back to EmitIndexOfAnyCustomHelper, which generates a helper using vectorized SearchValues<char> for ASCII with a scalar MatchCharacterClass fallback for non-ASCII.

EmitIndexOf / EmitIndexOfWithCharClassFallback (RegexCompiler) — parallel change that emits inline IL for the same pattern: IndexOfAnyExcept(SearchValues) for fast ASCII scanning, then EmitMatchCharacterClass for non-ASCII fallback. The existing inline code from EmitFixedSet_LeftToRight was extracted into the reusable EmitIndexOfWithCharClassFallback method.

CanEmitIndexOf — simplified to return true for all IsSetFamily nodes since EmitIndexOf can now handle them all.

EmitIndexOfAnyCustomHelper — extended with negate and last parameters. Negation is applied at the boolean level (via CharInClass/MatchCharacterClass) rather than by toggling the set string's internal negation flag, which doesn't correctly negate subtraction sets due to how CharInClassIterative processes subtraction chains.

Repeater path — changed from the fragile indexOfExpr.Replace("IndexOf", "Contains") pattern to the equivalent {indexOfExpr} >= 0.

Impact

Patterns like \w+, \d+, \s+, [\p{L}]+, [\w-[abc]]+ in atomic loops (which most become via auto-atomicity) and fixed-count repeaters now use vectorized search instead of scalar loops, for both RegexOptions.Compiled and source-generated regexes.

Extend TryEmitIndexOf (source generator) and EmitIndexOf/EmitIndexOfWithCharClassFallback
(RegexCompiler) to handle complex character classes (Unicode categories like \w, \d, \s,
and sets with subtraction) that previously fell through to scalar character-by-character
loops. The optimization uses SearchValues<char> for vectorized ASCII scanning with a
per-character MatchCharacterClass/CharInClass fallback for non-ASCII.

Key changes:
- EmitIndexOfAnyCustomHelper gains negate/last parameters, applying negation at the
  boolean level rather than toggling the set string flag (which breaks subtraction sets).
- TryEmitIndexOf always succeeds for set families, falling back to the custom helper.
- CanEmitIndexOf simplified to return true for all set families.
- EmitFixedSet_LeftToRight's inline IL extracted into reusable EmitIndexOfWithCharClassFallback.
- Repeater path changed from fragile Replace(IndexOf,Contains) to >= 0.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 15, 2026 20:30
@stephentoub
Copy link
Member Author

@MihuBot regexdiff

@stephentoub
Copy link
Member Author

@MihuBot benchmark Regex

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the regex compiler and source generator to better optimize searches involving complex character classes (e.g., \w, \d, \s, and non-enumerable / subtractive sets) by using an ASCII-vectorized SearchValues-based scan with a non-ASCII fallback, and refreshes the generator output baselines accordingly.

Changes:

  • Refactors RegexCompiler to use a shared EmitIndexOfWithCharClassFallback implementation for complex set searches, and broadens CanEmitIndexOf to include all set nodes.
  • Extends the generator’s custom IndexOf helper emission to support negated searches and last-index variants, and adjusts generator logic to use index comparisons rather than Contains* string rewrites.
  • Updates RegexGeneratorOutputTests expected outputs to reflect the new helper methods and emitted code patterns.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorOutputTests.cs Updates expected generated source baselines to match new IndexOf helper patterns.
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs Adds a reusable vectorized ASCII + non-ASCII fallback path for complex character classes and uses it in more places.
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs Enhances emitted helper generation for complex char-class searches (negation + last-index support) and updates IndexOf-based checks.

You can also share your feedback on Copilot code review. Take the survey.

// i += direction;
Ldloc(iLocal);
Ldc(1);
if (useLast) { Sub(); } else { Add(); }
@MihuBot
Copy link

MihuBot commented Mar 15, 2026

7681 out of 18857 patterns have generated source code changes.

Examples of GeneratedRegex source diffs
"\\s+" (24455 uses)
[GeneratedRegex("\\s+")]
                  // Match a whitespace character atomically at least once.
                  {
-                       int iteration = 0;
-                       while ((uint)iteration < (uint)slice.Length && char.IsWhiteSpace(slice[iteration]))
+                       int iteration = slice.IndexOfAnyExceptWhiteSpace();
+                       if (iteration < 0)
                      {
-                           iteration++;
+                           iteration = slice.Length;
                      }
                      
                      if (iteration == 0)
      /// <summary>Whether <see cref="s_defaultTimeout"/> is non-infinite.</summary>
      internal static readonly bool s_hasTimeout = s_defaultTimeout != Regex.InfiniteMatchTimeout;
      
+       /// <summary>Finds the next index of any character that does not match a whitespace character.</summary>
+       [MethodImpl(MethodImplOptions.AggressiveInlining)]
+       internal static int IndexOfAnyExceptWhiteSpace(this ReadOnlySpan<char> span)
+       {
+           int i = span.IndexOfAnyExcept(Utilities.s_asciiWhiteSpace);
+           if ((uint)i < (uint)span.Length)
+           {
+               if (char.IsAscii(span[i]))
+               {
+                   return i;
+               }
+       
+               do
+               {
+                   if (!char.IsWhiteSpace(span[i]))
+                   {
+                       return i;
+                   }
+                   i++;
+               }
+               while ((uint)i < (uint)span.Length);
+           }
+       
+           return -1;
+       }
+       
+       /// <summary>Supports searching for characters in or not in "\t\n\v\f\r ".</summary>
+       internal static readonly SearchValues<char> s_asciiWhiteSpace = SearchValues.Create("\t\n\v\f\r ");
+       
      /// <summary>Supports searching for characters in or not in "\t\n\v\f\r \u0085             \u2028\u2029   ".</summary>
      internal static readonly SearchValues<char> s_whitespace = SearchValues.Create("\t\n\v\f\r \u0085             \u2028\u2029   ");
  }
"^-?([^-+/*\\(\\)\\^\\s]+)" (17707 uses)
[GeneratedRegex("^-?([^-+/*\\(\\)\\^\\s]+)")]
                 {
                     int pos = base.runtextpos;
                     int matchStart = pos;
-                    char ch;
                     int capture_starting_pos = 0;
                     ReadOnlySpan<char> slice = inputSpan.Slice(pos);
                     
                         
                         // Match a character in the set [^(-+\-/^\s] atomically at least once.
                         {
-                            int iteration = 0;
-                            while ((uint)iteration < (uint)slice.Length && ((ch = slice[iteration]) < 128 ? ("쇿\uffff僾\uffff\uffff뿿\uffff\uffff"[ch >> 4] & (1 << (ch & 0xF))) != 0 : RegexRunner.CharInClass((char)ch, "\u0001\b\u0001(,-./0^_d")))
+                            int iteration = slice.IndexOfAnyExcept_1C7FD78BD41F1B11FB5695FDEB937D92852BB6BB96E8336434F81435C7D874DE();
+                            if (iteration < 0)
                             {
-                                iteration++;
+                                iteration = slice.Length;
                             }
                             
                             if (iteration == 0)
         
         /// <summary>Whether <see cref="s_defaultTimeout"/> is non-infinite.</summary>
         internal static readonly bool s_hasTimeout = s_defaultTimeout != Regex.InfiniteMatchTimeout;
+        
+        /// <summary>Finds the next index of any character that does not match a character in the set [^(-+\-/^\s].</summary>
+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+        internal static int IndexOfAnyExcept_1C7FD78BD41F1B11FB5695FDEB937D92852BB6BB96E8336434F81435C7D874DE(this ReadOnlySpan<char> span)
+        {
+            int i = span.IndexOfAnyExcept(Utilities.s_ascii_FFC1FFFFFE50FFFFFFFFFFBFFFFFFFFF);
+            if ((uint)i < (uint)span.Length)
+            {
+                if (char.IsAscii(span[i]))
+                {
+                    return i;
+                }
+        
+                char ch;
+                do
+                {
+                    if (((ch = span[i]) < 128 ? ("쇿\uffff僾\uffff\uffff뿿\uffff\uffff"[ch >> 4] & (1 << (ch & 0xF))) == 0 : !RegexRunner.CharInClass((char)ch, "\u0001\b\u0001(,-./0^_d")))
+                    {
+                        return i;
+                    }
+                    i++;
+                }
+                while ((uint)i < (uint)span.Length);
+            }
+        
+            return -1;
+        }
+        
+        /// <summary>Supports searching for characters in or not in "\0\u0001\u0002\u0003\u0004\u0005\u0006\a\b\u000e\u000f\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001a\u001b\u001c\u001d\u001e\u001f!\"#$%&amp;',.0123456789:;&lt;=&gt;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]_`abcdefghijklmnopqrstuvwxyz{|}~\u007f".</summary>
+        internal static readonly SearchValues<char> s_ascii_FFC1FFFFFE50FFFFFFFFFFBFFFFFFFFF = SearchValues.Create("\0\u0001\u0002\u0003\u0004\u0005\u0006\a\b\u000e\u000f\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001a\u001b\u001c\u001d\u001e\u001f!\"#$%&',.0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]_`abcdefghijklmnopqrstuvwxyz{|}~\u007f");
     }
 }
"\\%(\\d+)!.*?!" (17653 uses)
[GeneratedRegex("\\%(\\d+)!.*?!", RegexOptions.Singleline)]
                         // Match a Unicode digit atomically at least once.
                         {
-                            int iteration = 0;
-                            while ((uint)iteration < (uint)slice.Length && char.IsDigit(slice[iteration]))
+                            int iteration = slice.IndexOfAnyExceptDigit();
+                            if (iteration < 0)
                             {
-                                iteration++;
+                                iteration = slice.Length;
                             }
                             
                             if (iteration == 0)
         
         /// <summary>Whether <see cref="s_defaultTimeout"/> is non-infinite.</summary>
         internal static readonly bool s_hasTimeout = s_defaultTimeout != Regex.InfiniteMatchTimeout;
+        
+        /// <summary>Finds the next index of any character that does not match a Unicode digit.</summary>
+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+        internal static int IndexOfAnyExceptDigit(this ReadOnlySpan<char> span)
+        {
+            int i = span.IndexOfAnyExcept(Utilities.s_asciiDigits);
+            if ((uint)i < (uint)span.Length)
+            {
+                if (char.IsAscii(span[i]))
+                {
+                    return i;
+                }
+        
+                do
+                {
+                    if (!char.IsDigit(span[i]))
+                    {
+                        return i;
+                    }
+                    i++;
+                }
+                while ((uint)i < (uint)span.Length);
+            }
+        
+            return -1;
+        }
+        
+        /// <summary>Supports searching for characters in or not in "0123456789".</summary>
+        internal static readonly SearchValues<char> s_asciiDigits = SearchValues.Create("0123456789");
     }
 }
"{\\s*(?<P>\\D\\w*)\\s*\\:\\s*var\\(\\s*(?<B> ..." (9881 uses)
[GeneratedRegex("{\\s*(?<P>\\D\\w*)\\s*\\:\\s*var\\(\\s*(?<B>\\D\\w*)\\s*\\)\\s*(;\\s*(?<P>\\D\\w*)\\s*\\:\\s*var\\(\\s*(?<B>\\D\\w*)\\s*\\)\\s*\\s*)*}")]
                         slice = inputSpan.Slice(pos);
                         charloop_starting_pos = pos;
                         
-                        int iteration = 0;
-                        while ((uint)iteration < (uint)slice.Length && char.IsWhiteSpace(slice[iteration]))
+                        int iteration = slice.IndexOfAnyExceptWhiteSpace();
+                        if (iteration < 0)
                         {
-                            iteration++;
+                            iteration = slice.Length;
                         }
                         
                         slice = slice.Slice(iteration);
                             base.CheckTimeout();
                         }
                         
-                        if (charloop_starting_pos >= charloop_ending_pos)
+                        if (charloop_starting_pos >= charloop_ending_pos ||
+                            (charloop_ending_pos = inputSpan.Slice(charloop_starting_pos, charloop_ending_pos - charloop_starting_pos).LastIndexOfAnyExceptDigit()) < 0)
                         {
                             UncaptureUntil(0);
                             return false; // The input didn't match.
                         }
-                        pos = --charloop_ending_pos;
+                        charloop_ending_pos += charloop_starting_pos;
+                        pos = charloop_ending_pos;
                         slice = inputSpan.Slice(pos);
                         
                         CharLoopEnd:
                         
                         // Match a word character atomically any number of times.
                         {
-                            int iteration1 = 1;
-                            while ((uint)iteration1 < (uint)slice.Length && Utilities.IsWordChar(slice[iteration1]))
+                            int iteration1 = slice.Slice(1).IndexOfAnyExceptWordChar();
+                            if (iteration1 < 0)
                             {
-                                iteration1++;
+                                iteration1 = slice.Length - 1;
                             }
                             
                             slice = slice.Slice(iteration1);
                             pos += iteration1;
                         }
                         
+                        pos++;
+                        slice = inputSpan.Slice(pos);
                         base.Capture(2, capture_starting_pos, pos);
                     }
                     
                     // Match a whitespace character atomically any number of times.
                     {
-                        int iteration2 = 0;
-                        while ((uint)iteration2 < (uint)slice.Length && char.IsWhiteSpace(slice[iteration2]))
+                        int iteration2 = slice.IndexOfAnyExceptWhiteSpace();
+                        if (iteration2 < 0)
                         {
-                            iteration2++;
+                            iteration2 = slice.Length;
                         }
                         
                         slice = slice.Slice(iteration2);
                     
                     // Match a whitespace character atomically any number of times.
                     {
-                        int iteration3 = 1;
-                        while ((uint)iteration3 < (uint)slice.Length && char.IsWhiteSpace(slice[iteration3]))
+                        int iteration3 = slice.Slice(1).IndexOfAnyExceptWhiteSpace();
+                        if (iteration3 < 0)
                         {
-                            iteration3++;
+                            iteration3 = slice.Length - 1;
                         }
                         
                         slice = slice.Slice(iteration3);
                     }
                     
                     // Match the string "var(".
-                    if (!slice.StartsWith("var("))
+                    if (!slice.Slice(1).StartsWith("var("))
                     {
                         goto CharLoopBacktrack;
                     }
                     
                     // Match a whitespace character greedily any number of times.
                     //{
-                        pos += 4;
+                        pos += 5;
                         slice = inputSpan.Slice(pos);
                         charloop_starting_pos1 = pos;
                         
-                        int iteration4 = 0;
-                        while ((uint)iteration4 < (uint)slice.Length && char.IsWhiteSpace(slice[iteration4]))
+                        int iteration4 = slice.IndexOfAnyExceptWhiteSpace();
+                        if (iteration4 < 0)
                         {
-                            iteration4++;
+                            iteration4 = slice.Length;
                         }
                         
                         slice = slice.Slice(iteration4);
                             base.CheckTimeout();
                         }
                         
-                        if (charloop_starting_pos1 >= charloop_ending_pos1)
+                        if (charloop_starting_pos1 >= charloop_ending_pos1 ||
+                            (charloop_ending_pos1 = inputSpan.Slice(charloop_starting_pos1, charloop_ending_pos1 - charloop_starting_pos1).LastIndexOfAnyExceptDigit()) < 0)
                         {
                             goto CharLoopBacktrack;
                         }
-                        pos = --charloop_ending_pos1;
+                        charloop_ending_pos1 += charloop_starting_pos1;
+                        pos = charloop_ending_pos1;
                         slice = inputSpan.Slice(pos);
                         
                         CharLoopEnd1:
                         
                         // Match a word character atomically any number of times.
                         {
-                            int iteration5 = 1;
-                            while ((uint)iteration5 < (uint)slice.Length && Utilities.IsWordChar(slice[iteration5]))
+                            int iteration5 = slice.Slice(1).IndexOfAnyExceptWordChar();
+                            if (iteration5 < 0)
                             {
-                                iteration5++;
+                                iteration5 = slice.Length - 1;
                             }
                             
                             slice = slice.Slice(iteration5);
                             pos += iteration5;
                         }
                         
+                        pos++;
+                        slice = inputSpan.Slice(pos);
                         base.Capture(3, capture_starting_pos1, pos);
                     }
                     
                     // Match a whitespace character atomically any number of times.
                     {
-                        int iteration6 = 0;
-                        while ((uint)iteration6 < (uint)slice.Length && char.IsWhiteSpace(slice[iteration6]))
+                        int iteration6 = slice.IndexOfAnyExceptWhiteSpace();
+                        if (iteration6 < 0)
                         {
-                            iteration6++;
+                            iteration6 = slice.Length;
                         }
                         
                         slice = slice.Slice(iteration6);
                         slice = inputSpan.Slice(pos);
                         charloop_starting_pos2 = pos;
                         
-                        int iteration7 = 0;
-                        while ((uint)iteration7 < (uint)slice.Length && char.IsWhiteSpace(slice[iteration7]))
+                        int iteration7 = slice.IndexOfAnyExceptWhiteSpace();
+                        if (iteration7 < 0)
                         {
-                            iteration7++;
+                            iteration7 = slice.Length;
                         }
                         
                         slice = slice.Slice(iteration7);
                                 slice = inputSpan.Slice(pos);
                                 charloop_starting_pos3 = pos;
                                 
-                                int iteration8 = 0;
-                                while ((uint)iteration8 < (uint)slice.Length && char.IsWhiteSpace(slice[iteration8]))
+                                int iteration8 = slice.IndexOfAnyExceptWhiteSpace();
+                                if (iteration8 < 0)
                                 {
-                                    iteration8++;
+                                    iteration8 = slice.Length;
                                 }
                                 
                                 slice = slice.Slice(iteration8);
                                     base.CheckTimeout();
                                 }
                                 
-                                if (charloop_starting_pos3 >= charloop_ending_pos3)
+                                if (charloop_starting_pos3 >= charloop_ending_pos3 ||
+                                    (charloop_ending_pos3 = inputSpan.Slice(charloop_starting_pos3, charloop_ending_pos3 - charloop_starting_pos3).LastIndexOfAnyExceptDigit()) < 0)
                                 {
                                     goto LoopIterationNoMatch;
                                 }
-                                pos = --charloop_ending_pos3;
+                                charloop_ending_pos3 += charloop_starting_pos3;
+                                pos = charloop_ending_pos3;
                                 slice = inputSpan.Slice(pos);
                                 
                                 CharLoopEnd3:
                                 
                                 // Match a word character atomically any number of times.
                                 {
-                                    int iteration9 = 1;
-                                    while ((uint)iteration9 < (uint)slice.Length && Utilities.IsWordChar(slice[iteration9]))
+                                    int iteration9 = slice.Slice(1).IndexOfAnyExceptWordChar();
+                                    if (iteration9 < 0)
                                     {
-                                        iteration9++;
+                                        iteration9 = slice.Length - 1;
                                     }
                                     
                                     slice = slice.Slice(iteration9);
                                     pos += iteration9;
                                 }
                                 
+                                pos++;
+                                slice = inputSpan.Slice(pos);
                                 base.Capture(2, capture_starting_pos3, pos);
                             }
                             
                             // Match a whitespace character atomically any number of times.
                             {
-                                int iteration10 = 0;
-                                while ((uint)iteration10 < (uint)slice.Length && char.IsWhiteSpace(slice[iteration10]))
+                                int iteration10 = slice.IndexOfAnyExceptWhiteSpace();
+                                if (iteration10 < 0)
                                 {
-                                    iteration10++;
+                                    iteration10 = slice.Length;
                                 }
                                 
                                 slice = slice.Slice(iteration10);
                             
                             // Match a whitespace character atomically any number of times.
                             {
-                                int iteration11 = 1;
-                                while ((uint)iteration11 < (uint)slice.Length && char.IsWhiteSpace(slice[iteration11]))
+                                int iteration11 = slice.Slice(1).IndexOfAnyExceptWhiteSpace();
+                                if (iteration11 < 0)
                                 {
-                                    iteration11++;
+                                    iteration11 = slice.Length - 1;
                                 }
                                 
                                 slice = slice.Slice(iteration11);
                             }
                             
                             // Match the string "var(".
-                            if (!slice.StartsWith("var("))
+                            if (!slice.Slice(1).StartsWith("var("))
                             {
                                 goto CharLoopBacktrack3;
                             }
                             
                             // Match a whitespace character greedily any number of times.
                             //{
-                                pos += 4;
+                                pos += 5;
                                 slice = inputSpan.Slice(pos);
                                 charloop_starting_pos4 = pos;
                                 
-                                int iteration12 = 0;
-                                while ((uint)iteration12 < (uint)slice.Length && char.IsWhiteSpace(slice[iteration12]))
+                                int iteration12 = slice.IndexOfAnyExceptWhiteSpace();
+                                if (iteration12 < 0)
                                 {
-                                    iteration12++;
+                                    iteration12 = slice.Length;
                                 }
                                 
                                 slice = slice.Slice(iteration12);
                                     base.CheckTimeout();
                                 }
                                 
-                                if (charloop_starting_pos4 >= charloop_ending_pos4)
+                                if (charloop_starting_pos4 >= charloop_ending_pos4 ||
+                                    (charloop_ending_pos4 = inputSpan.Slice(charloop_starting_pos4, charloop_ending_pos4 - charloop_starting_pos4).LastIndexOfAnyExceptDigit()) < 0)
                                 {
                                     goto CharLoopBacktrack3;
                                 }
-                                pos = --charloop_ending_pos4;
+                                charloop_ending_pos4 += charloop_starting_pos4;
+                                pos = charloop_ending_pos4;
                                 slice = inputSpan.Slice(pos);
                                 
                                 CharLoopEnd4:
                                 
                                 // Match a word character atomically any number of times.
                                 {
-                                    int iteration13 = 1;
-                                    while ((uint)iteration13 < (uint)slice.Length && Utilities.IsWordChar(slice[iteration13]))
+                                    int iteration13 = slice.Slice(1).IndexOfAnyExceptWordChar();
+                                    if (iteration13 < 0)
                                     {
-                                        iteration13++;
+                                        iteration13 = slice.Length - 1;
                                     }
                                     
                                     slice = slice.Slice(iteration13);
                                     pos += iteration13;
                                 }
                                 
+                                pos++;
+                                slice = inputSpan.Slice(pos);
                                 base.Capture(3, capture_starting_pos4, pos);
                             }
                             
                             // Match a whitespace character atomically any number of times.
                             {
-                                int iteration14 = 0;
-                                while ((uint)iteration14 < (uint)slice.Length && char.IsWhiteSpace(slice[iteration14]))
+                                int iteration14 = slice.IndexOfAnyExceptWhiteSpace();
+                                if (iteration14 < 0)
                                 {
-                                    iteration14++;
+                                    iteration14 = slice.Length;
                                 }
                                 
                                 slice = slice.Slice(iteration14);
                             
                             // Match a whitespace character atomically any number of times.
                             {
-                                int iteration15 = 1;
-                                while ((uint)iteration15 < (uint)slice.Length && char.IsWhiteSpace(slice[iteration15]))
+                                int iteration15 = slice.Slice(1).IndexOfAnyExceptWhiteSpace();
+                                if (iteration15 < 0)
                                 {
-                                    iteration15++;
+                                    iteration15 = slice.Length - 1;
                                 }
                                 
                                 slice = slice.Slice(iteration15);
                                 pos += iteration15;
                             }
                             
+                            pos++;
+                            slice = inputSpan.Slice(pos);
                             base.Capture(1, capture_starting_pos2, pos);
                             
                             Utilities.StackPush(ref base.runstack!, ref stackpos, capture_starting_pos2);
         /// <summary>Whether <see cref="s_defaultTimeout"/> is non-infinite.</summary>
         internal static readonly bool s_hasTimeout = s_defaultTimeout != Regex.InfiniteMatchTimeout;
         
+        /// <summary>Finds the next index of any character that does not match a whitespace character.</summary>
+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+        internal static int IndexOfAnyExceptWhiteSpace(this ReadOnlySpan<char> span)
+        {
+            int i = span.IndexOfAnyExcept(Utilities.s_asciiWhiteSpace);
+            if ((uint)i < (uint)span.Length)
+            {
+                if (char.IsAscii(span[i]))
+                {
+                    return i;
+                }
+        
+                do
+                {
+                    if (!char.IsWhiteSpace(span[i]))
+                    {
+                        return i;
+                    }
+                    i++;
+                }
+                while ((uint)i < (uint)span.Length);
+            }
+        
+            return -1;
+        }
+        
+        /// <summary>Finds the next index of any character that does not match a word character.</summary>
+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+        internal static int IndexOfAnyExceptWordChar(this ReadOnlySpan<char> span)
+        {
+            int i = span.IndexOfAnyExcept(Utilities.s_asciiWordChars);
+            if ((uint)i < (uint)span.Length)
+            {
+                if (char.IsAscii(span[i]))
+                {
+                    return i;
+                }
+        
+                do
+                {
+                    if (!Utilities.IsWordChar(span[i]))
+                    {
+                        return i;
+                    }
+                    i++;
+                }
+                while ((uint)i < (uint)span.Length);
+            }
+        
+            return -1;
+        }
+        
         /// <summary>Determines whether the character is part of the [\w] set.</summary>
         [MethodImpl(MethodImplOptions.AggressiveInlining)]
         internal static bool IsWordChar(char ch)
                 (WordCategoriesMask & (1 << (int)CharUnicodeInfo.GetUnicodeCategory(ch))) != 0;
         }
         
+        /// <summary>Finds the last index of any character that matches any character other than a Unicode digit.</summary>
+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+        internal static int LastIndexOfAnyExceptDigit(this ReadOnlySpan<char> span)
+        {
+            int i = span.LastIndexOfAnyExcept(Utilities.s_asciiDigits);
+            if (i >= 0)
+            {
+                if (char.IsAscii(span[i]))
+                {
+                    return i;
+                }
+        
+                do
+                {
+                    if (!char.IsDigit(span[i]))
+                    {
+                        return i;
+                    }
+                    i--;
+                }
+                while (i >= 0);
+            }
+        
+            return -1;
+        }
+        
         /// <summary>Pops 2 values from the backtracking stack.</summary>
         [MethodImpl(MethodImplOptions.AggressiveInlining)]
         internal static void StackPop(int[] stack, ref int pos, out int arg0, out int arg1)
             0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xFF, 0x03,
             0xFE, 0xFF, 0xFF, 0x87, 0xFE, 0xFF, 0xFF, 0x07
         };
+        
+        /// <summary>Supports searching for characters in or not in "0123456789".</summary>
+        internal static readonly SearchValues<char> s_asciiDigits = SearchValues.Create("0123456789");
+        
+        /// <summary>Supports searching for characters in or not in "\t\n\v\f\r ".</summary>
+        internal static readonly SearchValues<char> s_asciiWhiteSpace = SearchValues.Create("\t\n\v\f\r ");
+        
+        /// <summary>Supports searching for characters in or not in "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz".</summary>
+        internal static readonly SearchValues<char> s_asciiWordChars = SearchValues.Create("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz");
     }
 }
"^[a-f0-9]{32}$" (4920 uses)
[GeneratedRegex("^[a-f0-9]{32}$")]
  // Match a lowercase hexadecimal digit exactly 32 times.
  {
-       if ((uint)slice.Length < 32 || slice.Slice(0, 32).ContainsAnyExcept(Utilities.s_asciiHexDigitsLower))
+       if ((uint)slice.Length < 32 || slice.Slice(0, 32).IndexOfAnyExcept(Utilities.s_asciiHexDigitsLower) >= 0)
      {
          return false; // The input didn't match.
      }

For more diff examples, see https://gist.github.com/MihuBot/d53c23335fba4c7efa18feb532b317b3

JIT assembly changes
Total bytes of base: 55567181
Total bytes of diff: 59629729
Total bytes of delta: 4062548 (7.31 % of base)
Total relative delta: 4432.43
    diff is a regression.
    relative diff is a regression.

For a list of JIT diff regressions, see Regressions.md
For a list of JIT diff improvements, see Improvements.md

Sample source code for further analysis
const string JsonPath = "RegexResults-1816.json";
if (!File.Exists(JsonPath))
{
    await using var archiveStream = await new HttpClient().GetStreamAsync("https://mihubot.xyz/r/FJQdSsTA");
    using var archive = new ZipArchive(archiveStream, ZipArchiveMode.Read);
    archive.Entries.First(e => e.Name == "Results.json").ExtractToFile(JsonPath);
}

using FileStream jsonFileStream = File.OpenRead(JsonPath);
RegexEntry[] entries = JsonSerializer.Deserialize<RegexEntry[]>(jsonFileStream, new JsonSerializerOptions { IncludeFields = true })!;
Console.WriteLine($"Working with {entries.Length} patterns");



record KnownPattern(string Pattern, RegexOptions Options, int Count);

sealed class RegexEntry
{
    public required KnownPattern Regex { get; set; }
    public required string MainSource { get; set; }
    public required string PrSource { get; set; }
    public string? FullDiff { get; set; }
    public string? ShortDiff { get; set; }
    public (string Name, string Values)[]? SearchValuesOfChar { get; set; }
    public (string[] Values, StringComparison ComparisonType)[]? SearchValuesOfString { get; set; }
}

@MihuBot
Copy link

MihuBot commented Mar 15, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants