feat(cache): add in flight deduping#4459
feat(cache): add in flight deduping#4459MasterPtato wants to merge 1 commit into03-18-fix_cache_clean_up_libfrom
Conversation
|
🚅 Deployed to the rivet-pr-4459 environment in rivet-frontend
|
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
175706a to
17d21f5
Compare
75e1e38 to
bc5d6d3
Compare
|
Code Review: feat(cache): add in-flight deduplication Good overall direction. Deduplicating concurrent cache fetches is a useful optimization that prevents thundering-herd on cold keys. Test suite is comprehensive and the HashMap refactor of GetterCtx is clean. CRITICAL 1: Leases never cleaned up on getter error. In req_config.rs, if tokio::try_join! propagates an error, execution jumps past the lease-cleanup loop. The scc::HashMap entries remain permanently because the local broadcast_tx drops but a clone still lives inside the map. All future requests for those keys find an occupied entry, subscribe to a channel that never signals, and wait IN_FLIGHT_TIMEOUT (5s). Fix: use a scopeguard or Drop guard to remove entries on scope exit. CRITICAL 2: Broadcast not sent when getter resolves no keys. broadcast_tx.send(()) is inside the if !entries_values.is_empty() block. If the getter succeeds but resolves zero keys (genuine misses), waiters block for the full 5 seconds. The signal should be sent unconditionally after the getter completes. MODERATE 3: Getter must now implement Clone (breaking API change). getter.clone() is called in the hot path. This silently requires every callers closure to implement Clone and will fail to compile otherwise. Should be documented in the PR description. MODERATE 4: buffer_unordered(1024) hardcoded. The concurrency cap for waiting subscribers is arbitrary. Consider making it configurable or deriving it from the actual number of waiting keys. MINOR 5: timeout_falls_back_to_getter test waits for the full 5-second IN_FLIGHT_TIMEOUT, slowing CI. Consider a shorter constant under cfg(test). MINOR 6: GetterCtx dedup behavior change. Old code used first-occurrence wins for duplicates; new HashMap::collect uses last-occurrence wins. Functionally equivalent since all entries start as None. MINOR 7: into_values ordering is now non-deterministic. GetterCtx::into_values returns entries in arbitrary HashMap order. If any caller relied on output order matching input keys, this is a silent behavior change. Positive: Vec->HashMap refactor is clean (O(1) lookups, natural dedup). get(keys: &[RawCacheKey]) API is better than Vec. Test coverage is solid (dedup, multi-waiter, independent keys, mixed cached+in-flight, timeout fallback). scc::HashMap is appropriate (lock-free). Sending broadcast after cache write is correct. |
bc5d6d3 to
b395fae
Compare
17d21f5 to
de18421
Compare

Description
Please include a summary of the changes and the related issue. Please also include relevant motivation and context.
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Checklist: