Implement sorting by hash#112
Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends hash-sorted-map to support deterministic hash-ordered iteration by introducing explicit iterator types and a sort_by_hash operation that reorders entries in-place (within hash-prefix group chains) before returning an owning HashSortedContainer. It also updates documentation around performance trade-offs and adds new Criterion benchmarks for iter/sort/merge scenarios.
Changes:
- Refactors core storage into
HashSortedContainer+Groupand re-exports the container. - Adds
Iter/IterMut/IntoIterfor bothHashSortedContainerandHashSortedMap, includingIntoIteratorimpls. - Adds
HashSortedMap::sort_by_hash()plus additional benchmarks and updated optimization notes.
Show a summary per file
| File | Description |
|---|---|
| crates/hash-sorted-map/src/lib.rs | Wires up new modules and re-exports container + iterator types. |
| crates/hash-sorted-map/src/iter.rs | Implements shared cursor-based iteration for container/map (immutable, mutable, and owning). |
| crates/hash-sorted-map/src/hash_sorted_map.rs | Refactors map to wrap HashSortedContainer, adds sort_by_hash, and updates insertion/lookup paths accordingly. |
| crates/hash-sorted-map/src/group.rs | Extracts Group struct and NO_OVERFLOW constant. |
| crates/hash-sorted-map/src/container.rs | Adds HashSortedContainer with allocation helpers and Drop that owns entry destruction. |
| crates/hash-sorted-map/OPTIMIZATIONS.md | Updates design notes and publishes new benchmark results/interpretation. |
| crates/hash-sorted-map/Cargo.toml | Adds smallvec dependency for sorting scratch storage. |
| crates/hash-sorted-map/benchmarks/performance.rs | Adds new benchmarks for iter/sort/merge, including k-way merge comparison. |
| crates/hash-sorted-map/benchmarks/Cargo.toml | Adds itertools dependency for k-way merge benchmark. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
crates/hash-sorted-map/src/hash_sorted_map.rs:946
- Same issue as above: this test assumes hashes are strictly increasing (
h > prev_hash), but hash collisions can occur (especially with strings), so the correct property is non-decreasing order. Update the assertion to tolerate equal hashes or validate ordering with a deterministic tie-breaker.
let mut prev_hash = 0u64;
let mut first = true;
for (k, _) in &container {
let h = hasher.hash_one(k);
if !first {
assert!(h > prev_hash, "hash order violated: {prev_hash:#x} > {h:#x}");
}
- Files reviewed: 9/9 changed files
- Comments generated: 3
…into aneubeck/merge
|
|
||
| hashbrown does **not** have this optimization — it always does a full SIMD | ||
| group scan. The reason why the performance is different is probably due to the different overflow strategies and the different load factors. | ||
| **Experimental finding**: This scalar check **hurts performance** on random |
There was a problem hiding this comment.
This makes so much sense. And removing it from lookup paths explains why we can sort the map and it's still a map. Pretty great outcome!
How valuable is it for resizing? Even if it usually hits, surely it's equally fast to find the first empty slot in the group with SIMD, and that will hit even more often.
|
|
||
| | Scenario | HashSortedMap | Comparison | Result | | ||
| | :------------------------------------------- | ------------: | :------------------------------------- | :---------- | | ||
| | Insert 1000 trigrams (pre-sized) | 7.34 µs | hashbrown::HashMap: 12.88 µs | ~43% faster | |
There was a problem hiding this comment.
This doesn't agree with the other file, which puts us 32% slower than hashbrown on the same microbenchmark. Different architecture?
There was a problem hiding this comment.
It would be best to have all the benchmarks in one place and up to date, and preferably all on Intel if that's what we think most cloud servers have.
| fn insert_after_grow<K: Hash + Eq, V, S: BuildHasher>( | ||
| map: &mut HashSortedMap<K, V, S>, | ||
| hash: u64, | ||
| _hash: u64, |
This PR
Note: hashmap remains intact after sorting now.