-
Notifications
You must be signed in to change notification settings - Fork 104
Description
Is your feature request related to a problem?
In OpenSearch 3.3, the sparse ANN feature with Seismic algorithm was released, dramatically boosting sparse vector retrieval efficiency (see RFC and blog for details). However, under high concurrency workloads (8+ concurrent query threads), we've identified a significant performance bottleneck in the LRU cache implementation used by the sparse ANN algorithm.
The current AbstractLruCache implementation uses Collections.synchronizedMap(LinkedHashMap) to track access recency. This design creates severe lock contention because every cache access requires acquiring a global lock to update the access order in the LinkedHashMap. Under concurrent load, query threads spend significant time waiting for this lock, which limits throughput scaling and increases query latency.
What solution would you like?
Update the current implementation of Collections.synchronizedMap(LinkedHashMap) with new data structures like ConcurrentHashMap. At least, some benchmark results should be provided for the PR.
What alternatives have you considered?
- Allow users to disable the cache to record access order.
- Implement cache watermark to support batch eviction.
Do you have any additional context?
// single-thread
OpenSearch 'took' time statistics:
Average: 2.36ms
Min: 1ms
Max: 9ms
Median: 2ms
P95: 4ms
P99: 5ms
// 50-thread (slowed down by update access)
OpenSearch 'took' time statistics:
Average: 36.81ms
Min: 1ms
Max: 205ms
Median: 33ms
P95: 79ms
P99: 107ms