Implement sorting by hash by aneubeck · Pull Request #112 · github/rust-gems

aneubeck · 2026-05-08T07:57:55Z

This PR

adds iterators
adds sort_by_hash feature that data is iterated according to their hash
adds benchmarks for sort/merge use cases
Note: hashmap remains intact after sorting now.

Copilot

Pull request overview

This PR extends hash-sorted-map to support deterministic hash-ordered iteration by introducing explicit iterator types and a sort_by_hash operation that reorders entries in-place (within hash-prefix group chains) before returning an owning HashSortedContainer. It also updates documentation around performance trade-offs and adds new Criterion benchmarks for iter/sort/merge scenarios.

Changes:

Refactors core storage into HashSortedContainer + Group and re-exports the container.
Adds Iter / IterMut / IntoIter for both HashSortedContainer and HashSortedMap, including IntoIterator impls.
Adds HashSortedMap::sort_by_hash() plus additional benchmarks and updated optimization notes.

Show a summary per file

File	Description
crates/hash-sorted-map/src/lib.rs	Wires up new modules and re-exports container + iterator types.
crates/hash-sorted-map/src/iter.rs	Implements shared cursor-based iteration for container/map (immutable, mutable, and owning).
crates/hash-sorted-map/src/hash_sorted_map.rs	Refactors map to wrap `HashSortedContainer`, adds `sort_by_hash`, and updates insertion/lookup paths accordingly.
crates/hash-sorted-map/src/group.rs	Extracts `Group` struct and `NO_OVERFLOW` constant.
crates/hash-sorted-map/src/container.rs	Adds `HashSortedContainer` with allocation helpers and Drop that owns entry destruction.
crates/hash-sorted-map/OPTIMIZATIONS.md	Updates design notes and publishes new benchmark results/interpretation.
crates/hash-sorted-map/Cargo.toml	Adds `smallvec` dependency for sorting scratch storage.
crates/hash-sorted-map/benchmarks/performance.rs	Adds new benchmarks for iter/sort/merge, including k-way merge comparison.
crates/hash-sorted-map/benchmarks/Cargo.toml	Adds `itertools` dependency for k-way merge benchmark.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

crates/hash-sorted-map/src/hash_sorted_map.rs:946

Same issue as above: this test assumes hashes are strictly increasing (h > prev_hash), but hash collisions can occur (especially with strings), so the correct property is non-decreasing order. Update the assertion to tolerate equal hashes or validate ordering with a deterministic tie-breaker.

        let mut prev_hash = 0u64;
        let mut first = true;
        for (k, _) in &container {
            let h = hasher.hash_one(k);
            if !first {
                assert!(h > prev_hash, "hash order violated: {prev_hash:#x} > {h:#x}");
            }

Files reviewed: 9/9 changed files
Comments generated: 3

…into aneubeck/merge

jorendorff · 2026-05-11T21:54:13Z


-hashbrown does **not** have this optimization — it always does a full SIMD
-group scan. The reason why the performance is different is probably due to the different overflow strategies and the different load factors.
+**Experimental finding**: This scalar check **hurts performance** on random


This makes so much sense. And removing it from lookup paths explains why we can sort the map and it's still a map. Pretty great outcome!

How valuable is it for resizing? Even if it usually hits, surely it's equally fast to find the first empty slot in the group with SIMD, and that will hit even more often.

jorendorff · 2026-05-11T22:03:45Z

+
+| Scenario                                     | HashSortedMap | Comparison                             | Result      |
+| :------------------------------------------- | ------------: | :------------------------------------- | :---------- |
+| Insert 1000 trigrams (pre-sized)             |       7.34 µs | hashbrown::HashMap: 12.88 µs           | ~43% faster |


This doesn't agree with the other file, which puts us 32% slower than hashbrown on the same microbenchmark. Different architecture?

It would be best to have all the benchmarks in one place and up to date, and preferably all on Intel if that's what we think most cloud servers have.

jorendorff · 2026-05-11T22:23:29Z

 fn insert_after_grow<K: Hash + Eq, V, S: BuildHasher>(
    map: &mut HashSortedMap<K, V, S>,
-    hash: u64,
+    _hash: u64,


Clean this up?

aneubeck added 9 commits May 6, 2026 14:43

add normal iterator

dbbc00e

sorting + cleanup

f6f4c21

sorting benchmark

540b308

more benches

0efe8e0

Update hash_sorted_map.rs

34dad1a

remove slots mostly and update optimizations file

170870e

Update performance.rs

e73f642

unsorted

b02de16

cleanup code a bit

2f2c7a3

Copilot AI review requested due to automatic review settings May 8, 2026 07:57

aneubeck requested a review from a team as a code owner May 8, 2026 07:57

Copilot started reviewing on behalf of aneubeck May 8, 2026 07:58 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

Comment thread crates/hash-sorted-map/src/hash_sorted_map.rs Outdated

Comment thread crates/hash-sorted-map/src/hash_sorted_map.rs

Comment thread crates/hash-sorted-map/src/hash_sorted_map.rs Outdated

aneubeck added 12 commits May 8, 2026 10:14

without slot_hint, we sort keeps the structure intact

c12cac8

fix build warnings

0a9dd24

Update table with intel results

21359c1

solve tiebreaker

9e61374

remove smallvec

c11ba84

Merge branch 'aneubeck/merge' of https://github.com/github/rust-gems …

beab417

…into aneubeck/merge

fix clippy

7c9c89b

Update hash_sorted_map.rs

7e3931d

Update performance.rs

280aa74

fix bug

1a35afa

Update hash_sorted_map.rs

ca37af7

Update hash_sorted_map.rs

f969990

jorendorff approved these changes May 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement sorting by hash#112

Implement sorting by hash#112
aneubeck wants to merge 21 commits into
mainfrom
aneubeck/merge

aneubeck commented May 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jorendorff May 11, 2026

Uh oh!

jorendorff May 11, 2026

Uh oh!

jorendorff May 11, 2026

Uh oh!

jorendorff May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aneubeck commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jorendorff May 11, 2026

Choose a reason for hiding this comment

Uh oh!

jorendorff May 11, 2026

Choose a reason for hiding this comment

Uh oh!

jorendorff May 11, 2026

Choose a reason for hiding this comment

Uh oh!

jorendorff May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aneubeck commented May 8, 2026 •

edited

Loading