Skip to content

Frozen/String/Hashing: replace unsafe pointer arithmetic with safe span indexing#125547

Closed
Copilot wants to merge 2 commits intomainfrom
copilot/hashing-changes
Closed

Frozen/String/Hashing: replace unsafe pointer arithmetic with safe span indexing#125547
Copilot wants to merge 2 commits intomainfrom
copilot/hashing-changes

Conversation

Copy link
Contributor

Copilot AI commented Mar 14, 2026

Hashing.GetHashCodeOrdinal and Hashing.GetHashCodeOrdinalIgnoreCaseAscii used unsafe/fixed/uint* casts to read two UTF-16 chars at a time as a packed uint. Replace with explicit bit-packing via safe span indexing — ((uint)s[i] | ((uint)s[i+1] << 16)) — which is mathematically identical on all little-endian .NET platforms and removes the unsafe blocks and the System.Runtime.InteropServices import entirely.

GetHashCodeOrdinal

  • case 4: ((uint*)src)[0/1] → explicit pack ((uint)s[0] | ((uint)s[1] << 16)) etc.
  • default: uint* ptrUInt32 + while (length >= 4) → index-based for loop; trailing char* pointer walk → second for loop over s[i]

GetHashCodeOrdinalIgnoreCaseAscii

  • Same structural changes as above, preserving | LowercaseUInt32 on packed pairs and | LowercaseChar on trailing single chars (the old code erroneously used LowercaseUInt32 on single chars, which set a spurious high-lane bit; semantically equivalent since these are internal non-persisted hashes, but now consistent)

Hash values are only used internally within the frozen collection build+lookup cycle and are never persisted, so no stable-hash contract is broken.

Original prompt

please take only Hashing changes (don't remove other changes for now, just don't commit them) and send a PR to dotnet/runtime. Use the benchmark skill to validate perf impact.

**Chronological Review:** 1. User requested a plan to find and fix trivially-replaceable unsafe code in dotnet/runtime Libraries and CoreLib 2. Multiple exploration agents (agent-0 through agent-10) searched the codebase for unsafe patterns: pointers, `fixed` statements, `Unsafe.*`, `MemoryMarshal.*`, `Marshal.*` APIs 3. Initial plan was created with candidates including Enum.cs (Unsafe.As replacement), FixedBufferExtensions.cs, WebHeaderEncoding.cs, Hashing.cs 4. User rejected the plan: "do not replace unsafe with 'safe' Unsafe, MemoryMarshal. Wasn't it clear??" — meaning replacements must use TRULY safe APIs (Span, indexing, ToString, etc.), not just swap one unsafe API for another 5. Plan was revised with 4 truly-safe candidates: FixedBufferExtensions.cs, WebHeaderEncoding.cs, StringParser.cs, Hashing.cs 6. Plan approved, autopilot mode entered 7. Baseline build run successfully (clr+libs) 8. All 4 files were edited with safe replacements 9. All affected projects built successfully (CoreLib, HttpListener, Process, Collections.Immutable including netstandard2.0) 10. Tests run: HttpListener (693 pass), Common (1695 pass), Immutable Collections (22068 pass), Process (363 pass) — all 0 failures 11. Code review completed — all changes approved 12. User then asked to update `GetHashCodeOrdinalIgnoreCaseAscii` to match the shape they'd manually changed `GetHashCodeOrdinal` to (using `for` loops with `s.Length` instead of decrementing `length` counter) 13. That change was made, built, tested (22068 pass) 14. User then asked to save all changes EXCEPT Hashing.cs to a .diff patch, keeping only Hashing.cs modified

Intent Mapping:

  • Original: Find and fix trivially-replaceable unsafe code in Libraries/CoreLib
  • Correction: Only truly safe replacements (no Unsafe./MemoryMarshal. in replacements)
  • Follow-up: Match IgnoreCase method to user's manually-edited Ordinal method shape
  • Final: Save non-Hashing changes to a patch, revert those files, keep only Hashing.cs dirty

Technical Inventory:

  • dotnet/runtime repo at C:\prj\runtime-main2, branch replace-fixed-buffers
  • .NET SDK 11.0.100-preview.1.26104.118
  • Build: dotnet build for individual projects
  • Test: dotnet build /t:test
  • Target frameworks: NetCoreAppCurrent (net11.0), netstandard2.0, net481

Code Archaeology:

  • FixedBufferExtensions.cs: fixed+new string(ptr)span.Slice().ToString() (REVERTED, saved to patch)
  • WebHeaderEncoding.cs: fixed pointer loops → Span indexing (REVERTED, saved to patch)
  • StringParser.cs: fixed+char* iteration → string.AsSpan() indexing in 4 parse methods (REVERTED, saved to patch)
  • Hashing.cs: fixed+(uint*) casts → span indexing + bit packing ((uint)s[i] | ((uint)s[i+1] << 16)) (STILL MODIFIED)

Progress Assessment:

  • All 4 original changes implemented and tested ✅
  • User manually edited GetHashCodeOrdinal default case to use for loops
  • GetHashCodeOrdinalIgnoreCaseAscii updated to match ✅
  • 3 files reverted, saved to non-hashing-unsafe-cleanup.diff
  • Only Hashing.cs remains modified in working tree

Recent Commands Analysis:

  • git diff -- <3 files> > non-hashing-unsafe-cleanup.diff — saved patch
  • git checkout -- <3 files> — reverted the 3 non-Hashing files
  • git diff --stat — confirmed only Hashing.cs remains modified (84 insertions, 100 deletions)
  • task_complete called with summary
1. Conversation Overview: - Primary Objectives: User requested: "Please walk over the codebase and try to find places with unsafe code that you can trivially make safe." Rules: only trivial changes, ignore interop/PInvokes, focus on optimization-related unsafe code (pointers for performance), replace with safe alternatives like `Span`. Only focus on Libraries and CoreLib. Do not commit changes in git. - Session Context: Extensive codebase exploration (11 background agents) → plan creation → plan rejection ("do not replace unsafe with 'safe' Unsafe, MemoryMarshal") → revised plan → implementation of 4 file changes → builds + tests → user manually tweaked one file → matching update → patch extraction for 3 files, keeping only Hashing.cs modified - User Intent Evolution: Initial broad search → narrowed to truly safe replacements only (no Unsafe.*/MemoryMarshal.* in replacement code) → user wanted to ship Hashing.cs separately, saving other changes as a .diff patch
  1. Technical Foundation:
    • Repository: dotnet/runtime at C:\prj\runtime-main2, branch replace-fixed-buffers, commit 53c6c4e7d3d
    • .NET SDK: 11.0.100-preview.1.26104.118 at C:\prj\runtime-main2\.dotnet\dotnet.exe
    • Build approach: dotnet build <project>.csproj for individual projects; dotnet build /t:test <test-project>.csproj for tests
    • Target frameworks vary: System.Net.HttpL...

Created from Copilot CLI via the copilot delegate command.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…pan indexing

Co-authored-by: EgorBo <523221+EgorBo@users.noreply.github.com>
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Copilot AI changed the title [WIP] Add hashing changes for performance validation Frozen/String/Hashing: replace unsafe pointer arithmetic with safe span indexing Mar 14, 2026
Copilot AI requested a review from EgorBo March 14, 2026 02:11

This comment was marked as outdated.

@EgorBo
Copy link
Member

EgorBo commented Mar 14, 2026

@copilot please use the benchmark skill to validate performance of this code using public APIs and for different lengths.

This comment was marked as outdated.

@EgorBo
Copy link
Member

EgorBo commented Mar 14, 2026

@MihuBot

@EgorBo
Copy link
Member

EgorBo commented Mar 14, 2026

@copilot please use the benchmark skill to validate performance of this code using public APIs and for different lengths.

thanks for writing a benchmark with an infinite loop in Setup.. 😞

Copy link
Contributor Author

Copilot AI commented Mar 14, 2026

@copilot please use the benchmark skill to validate performance of this code using public APIs and for different lengths.

...

Sorry about that — KeyLength=1 only allows 26 unique lowercase keys but Setup tried to generate 100. Here's the fixed benchmark with deterministic key generation:

@EgorBot -linux_amd -osx_arm64

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Collections.Frozen;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

[MemoryDiagnoser]
public class Bench
{
    private FrozenDictionary<string, int> _ordinalDict = default!;
    private FrozenDictionary<string, int> _ignoreCaseDict = default!;
    private string[] _keys = default!;

    [Params(1, 3, 4, 5, 8, 20)]
    public int KeyLength { get; set; }

    private static string GenerateKey(int index, int length)
    {
        var chars = new char[length];
        for (int i = length - 1; i >= 0; i--)
        {
            chars[i] = (char)('a' + (index % 26));
            index /= 26;
        }
        return new string(chars);
    }

    [GlobalSetup]
    public void Setup()
    {
        int maxUnique = 1;
        for (int i = 0; i < KeyLength && maxUnique < 100; i++)
            maxUnique *= 26;
        int numKeys = Math.Min(100, maxUnique);

        var dict = new Dictionary<string, int>(numKeys);
        for (int i = 0; i < numKeys; i++)
            dict[GenerateKey(i, KeyLength)] = i;

        _ordinalDict = dict.ToFrozenDictionary(StringComparer.Ordinal);
        _ignoreCaseDict = dict.ToFrozenDictionary(StringComparer.OrdinalIgnoreCase);
        _keys = dict.Keys.ToArray();
    }

    [Benchmark]
    public int Lookup_Ordinal()
    {
        int sum = 0;
        var dict = _ordinalDict;
        var keys = _keys;
        for (int i = 0; i < keys.Length; i++)
            sum += dict[keys[i]];
        return sum;
    }

    [Benchmark]
    public int Lookup_OrdinalIgnoreCase()
    {
        int sum = 0;
        var dict = _ignoreCaseDict;
        var keys = _keys;
        for (int i = 0; i < keys.Length; i++)
            sum += dict[keys[i]];
        return sum;
    }

    [Benchmark]
    public FrozenDictionary<string, int> Create_Ordinal()
    {
        var source = new Dictionary<string, int>(_keys.Length);
        var keys = _keys;
        for (int i = 0; i < keys.Length; i++)
            source[keys[i]] = i;
        return source.ToFrozenDictionary(StringComparer.Ordinal);
    }

    [Benchmark]
    public FrozenDictionary<string, int> Create_OrdinalIgnoreCase()
    {
        var source = new Dictionary<string, int>(_keys.Length);
        var keys = _keys;
        for (int i = 0; i < keys.Length; i++)
            source[keys[i]] = i;
        return source.ToFrozenDictionary(StringComparer.OrdinalIgnoreCase);
    }
}

@EgorBo

This comment was marked as outdated.

Copy link
Contributor Author

Copilot AI commented Mar 14, 2026

@copilot, benchmark results are ready:

Both jobs timed out — the Create_* benchmarks are too expensive across 6 param values. Here's a slimmer benchmark focused on just lookups (which exercise the hash functions) and only the lengths where code actually changed (≥4):

@EgorBot -linux_amd -osx_arm64

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Collections.Frozen;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

public class Bench
{
    private FrozenDictionary<string, int> _ordinalDict = default!;
    private FrozenDictionary<string, int> _ignoreCaseDict = default!;
    private string[] _keys = default!;

    [Params(4, 5, 8, 20)]
    public int KeyLength { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        var dict = new Dictionary<string, int>(50);
        for (int i = 0; i < 50; i++)
        {
            var chars = new char[KeyLength];
            int idx = i;
            for (int j = KeyLength - 1; j >= 0; j--)
            {
                chars[j] = (char)('a' + (idx % 26));
                idx /= 26;
            }
            dict[new string(chars)] = i;
        }

        _ordinalDict = dict.ToFrozenDictionary(StringComparer.Ordinal);
        _ignoreCaseDict = dict.ToFrozenDictionary(StringComparer.OrdinalIgnoreCase);
        _keys = dict.Keys.ToArray();
    }

    [Benchmark]
    public int Lookup_Ordinal()
    {
        int sum = 0;
        var dict = _ordinalDict;
        var keys = _keys;
        for (int i = 0; i < keys.Length; i++)
            sum += dict[keys[i]];
        return sum;
    }

    [Benchmark]
    public int Lookup_OrdinalIgnoreCase()
    {
        int sum = 0;
        var dict = _ignoreCaseDict;
        var keys = _keys;
        for (int i = 0; i < keys.Length; i++)
            sum += dict[keys[i]];
        return sum;
    }
}

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • bot.egorbo.com
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)
  • https://api.github.com/repos/EgorBot/Benchmarks/issues/38/comments
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (http block)

If you need me to access, download, or install something from one of these locations, you can either:

@EgorBo
Copy link
Member

EgorBo commented Mar 14, 2026

@copilot, benchmark results are ready:

Please analyze the results and act accordingly.
NOTE: some benchmarks may be flaky or bi-modal, so use your judgment when interpreting small differences.

Copy link
Contributor Author

Copilot AI commented Mar 14, 2026

@copilot, benchmark results are ready:

Benchmark analysis across both platforms shows no regressions — the PR is consistently equal or slightly faster:

Linux AMD (EPYC Turin):

  • Lookup_Ordinal: PR is 4-7% faster across all key lengths (e.g., KeyLength=8: 299ns vs 319ns)
  • Lookup_OrdinalIgnoreCase: PR is 6-7% faster (e.g., KeyLength=4: 315ns vs 337ns)
  • Create_*: neutral (ratio ~1.00)

macOS ARM64 (Apple M2):

  • Lookup_Ordinal: PR is ~2% faster across lengths ≥3 (e.g., KeyLength=4: 365ns vs 373ns)
  • Lookup_OrdinalIgnoreCase: PR is 2-3% faster (e.g., KeyLength=4: 372ns vs 381ns)
  • Create_*: neutral (ratio ~1.00)

The safe span indexing approach generates equal or better codegen than the unsafe pointer arithmetic on both x64 and ARM64. No action needed — the change is a net positive for both safety and performance.

@EgorBo
Copy link
Member

EgorBo commented Mar 14, 2026

@copilot, benchmark results are ready:

Please analyze the results and act accordingly.
NOTE: some benchmarks may be flaky or bi-modal, so use your judgment when interpreting small differences.

Copy link
Contributor Author

Copilot AI commented Mar 14, 2026

@copilot, benchmark results are ready:

Second round of benchmarks confirms the same trend — no regressions, PR is consistently equal or faster:

Linux AMD (EPYC 9V45 Turin):

  • Lookup_Ordinal: 1–8% faster (e.g., KeyLength=20: 162ns vs 174ns)
  • Lookup_OrdinalIgnoreCase: 3–19% faster, scaling with length (KeyLength=20: 168ns vs 199ns)

macOS ARM64 (Apple M4):

  • Both benchmarks ~1% faster across all key lengths (ratio 1.00–1.01)

The improvement on x64 is particularly notable at longer key lengths, consistent with the main for loop generating tighter codegen with span indexing vs. the old pointer arithmetic. No action needed.

@EgorBo EgorBo marked this pull request as ready for review March 14, 2026 16:03
Copilot AI review requested due to automatic review settings March 14, 2026 16:03
Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I've checked locally that the safe impl generates no bounds checks, it comes with a bi to extra codegen in the default: case due to loop cloning, but the fast loop is as fast as the unsafe code.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Frozen string hashing implementation in System.Collections.Immutable to remove unsafe pointer-based reads and replace them with safe ReadOnlySpan<char> indexing plus explicit 2x-UTF16-to-uint bit packing.

Changes:

  • Removed unsafe / fixed / uint* casts from Hashing.GetHashCodeOrdinal and replaced them with span indexing and explicit packing of two UTF-16 chars into a uint.
  • Applied the same safe refactor to Hashing.GetHashCodeOrdinalIgnoreCaseAscii, including using LowercaseChar for trailing single chars.
  • Removed the now-unused System.Runtime.InteropServices import.

char* ptrChar = (char*)ptrUInt32;
while (length-- > 0)
{
hash2 = (BitOperations.RotateLeft(hash2, 5) + hash2) ^ (*ptrChar | LowercaseUInt32);
Copy link
Member

@EgorBo EgorBo Mar 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a copy-paste issue in the previous code - the single-char loop uses LowercaseUInt32 (for 2 chars at a time) instead of LowercaseChar. Not a correctness issue

@EgorBo
Copy link
Member

EgorBo commented Mar 16, 2026

PTAL @tannergooding @MihaZupan @stephentoub

No new bounds checks in the safer code, benchmarks even report slight improvements.

@stephentoub
Copy link
Member

No new bounds checks in the safer code

IIRC, it was done this way not because of bounds check on core, but because of performance on netfx. Does this not regress collection lookup times there?

@EgorBo
Copy link
Member

EgorBo commented Mar 16, 2026

No new bounds checks in the safer code

IIRC, it was done this way not because of bounds check on core, but because of performance on netfx. Does this not regress collection lookup times there?

I guess I can manually check, but it definitely regress the bounds checks on netfx like pretty much all "reduce-unsafe" PRs do I think. If the goal is to avoid perf regressions for netfx at all costs - there will be quite a small area to remove unsafe from 😞

@stephentoub
Copy link
Member

If the goal is to avoid perf regressions for netfx at all costs - there will be quite a small area to remove unsafe from 😞

The goal isn't "remove unsafe at all costs", either. We're trying to remove unsafe thoughtfully, where either it's not necessary because there's just as good or better a way to do it without unsafe, or because it's not adding enough value to be worth the extra risk. For places where it still adds sufficient value, we're happy to keep using unsafe.

For packages that build downlevel, performance on netfx absolutely needs to factor into those equations.

@EgorBo
Copy link
Member

EgorBo commented Mar 16, 2026

We're trying to remove unsafe thoughtfully, where either it's not necessary because there's just as good or better a way to do it without unsafe, or because it's not adding enough value to be worth the extra risk. For places where it still adds sufficient value, we're happy to keep using unsafe.

For packages that build downlevel, performance on netfx absolutely needs to factor into those equations.

@stephentoub I've checked - it is a big regression for .NET Framework for large inputs (for 1-5 lengths it's a fraction of a nanosecond, but for len=100 could be 120ns vs 20ns) - bench.cs, do I just copy-paste it for .NET with #if ?

I believe there is a value in replacing this code with a safer implementation. The current code is difficult to verify at a glance that it doesn't go out of bounds, contains a non-portable unaligned store, and appears to have a copy-paste mistake. The new version resolves these issues and is also faster.

@stephentoub
Copy link
Member

stephentoub commented Mar 16, 2026

Personally, I would prefer instead that we address:

// TODO https://github.com/dotnet/runtime/issues/77679:
// Replace these once non-randomized implementations are available.

then this code won't exist in this package at all, at least for .NET 11, and we'd just leave the code untouched for downlevel (other than if there's a copy/paste bug to fix like you suggest).

@EgorBo
Copy link
Member

EgorBo commented Mar 16, 2026

#77679

Ok, I guess that new API can re-use this impl once it's introduced

@EgorBo EgorBo closed this Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants