Skip to content

perf(ring): cache ShuffleShardWithLookback subrings#7628

Open
afhassan wants to merge 1 commit into
cortexproject:masterfrom
afhassan:shuffle-shard-cache
Open

perf(ring): cache ShuffleShardWithLookback subrings#7628
afhassan wants to merge 1 commit into
cortexproject:masterfrom
afhassan:shuffle-shard-cache

Conversation

@afhassan

@afhassan afhassan commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

What this PR does

Add caching for ShuffleShardWithLookback subrings. Previously, the shuffle shard lookback variant
recomputed on every call because its membership is time-dependent and could not be
represented by the existing topology-keyed cache.

New ShuffleShardWithLookback cache

A lookback subring's membership can only change for one of two reasons, and each is
covered by an invalidation path:

  1. Topology change (instance added, removed, tokens changed, or a state transition
    to/from READONLY) — invalidates the entire shuffledSubringCache via
    lastTopologyChange. EqualButReadOnly is treated as a topology change for this
    purpose, so READONLY-driven inclusion changes are covered here.
  2. An included-by-lookback instance ages out of the window — handled here.
    shuffleShard records shuffleShardExpiry = min(RegisteredTimestamp + lookbackPeriod)
    across all lookback-included instances on the subring. getCachedShuffledSubring
    treats the entry as a miss once now >= shuffleShardExpiry, forcing a recompute.

Implementation notes

  • subringCacheKey gains lookbackPeriod so lookback subrings don't collide with
    plain ones.
  • Subrings with no lookback-included instances get shuffleShardExpiry == 0 and
    behave like plain subrings (only topology invalidates).

Tests

Added TestShuffleShardWithLookbackCaching in pkg/ring/ring_test.go, covering:

  • Repeated calls before expiry return the same cached subring, with instance
    timestamps refreshed on hit.
  • Advancing now past RegisteredTimestamp + lookbackPeriod forces a recompute and
    the new subring shrinks (instances fall out of the window).
  • A subring with no lookback-included instances has no time-based expiry and stays
    cached.
  • Plain and lookback subrings for the same (user, size) are cached under separate
    keys.
  • Topology change (stopping instances) invalidates the cached lookback subring.
  • Different shard sizes get separate cache entries; CleanupShuffleShardCache
    evicts the entry.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • docs/configuration/v1-guarantees.md updated if this PR introduces experimental flags

@dosubot dosubot Bot added component/ring go Pull requests that update Go code type/performance labels Jun 18, 2026
Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com>
@afhassan afhassan force-pushed the shuffle-shard-cache branch from 34d6007 to 2d60675 Compare June 18, 2026 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/ring go Pull requests that update Go code size/L type/performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant