Improve memory estimates in cloud storage#29773
Improve memory estimates in cloud storage#29773ballard26 wants to merge 3 commits intoredpanda-data:devfrom
Conversation
The memory semaphore that limits materialized remote segments was significantly underestimating per-segment memory usage. The projection only accounted for sizeof(remote_segment) + 4KB for the index, missing coroutine frames, heap-allocated subobjects (paths, coarse index, segment_chunks), and ~40KB per segment from chunked_fifo's free chunk caching policy in the expiring_fifo wait lists. This caused the system to materialize far more segments than actual memory could support, leading to OOM under workloads with many small segments and prefetch enabled.
There was a problem hiding this comment.
Pull request overview
This PR improves memory estimates for materialized remote segments in cloud storage to prevent OOM in workloads with many small segments. The previous estimate of sizeof(remote_segment) + 4KB significantly underestimated actual usage, primarily missing the ~40KB overhead from expiring_fifo retained free chunks in both segment and chunk hydration wait lists.
Changes:
- Added
estimate_wait_list_overhead()static method and improvedestimate_memory_use()to account for heap-allocated paths, coarse index, segment chunks, and waiter FIFO overhead. - Updated
projected_remote_segment_memory_usage()inmaterialized_resources.ccto include subobject, coroutine, and waiter overhead (~56KB total projected overhead vs previous ~4KB). - Fixed two typos ("extimate" → "estimate") in log messages and a minor code ordering cleanup in
remote_partition.cc.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/v/cloud_storage/remote_segment.h |
Added estimate_wait_list_overhead() static method declaration |
src/v/cloud_storage/remote_segment.cc |
Implemented estimate_wait_list_overhead() and enhanced estimate_memory_use() with heap path, index, chunk, and waiter overhead accounting |
src/v/cloud_storage/materialized_resources.cc |
Updated projected memory estimate with subobject, coroutine, and waiter overhead; fixed two "extimate" → "estimate" typos |
src/v/cloud_storage/remote_partition.cc |
Moved _ts_probe.segment_materialized() after log statement for consistency |
| // retry chain, etc.) | ||
| res += sizeof(segment_chunks); |
There was a problem hiding this comment.
sizeof(remote_segment) already includes sizeof(std::optional<segment_chunks>), which provides inline storage for the entire segment_chunks struct. Adding sizeof(segment_chunks) again when _chunks_api is engaged double-counts the inline struct size.
If the intent is to account for heap allocations from segment_chunks's internal data structures (e.g., the _chunks btree_map root node, _prefetches chunked_vector buffer, etc.), consider estimating that heap overhead directly instead of using sizeof(segment_chunks).
For a memory-estimation function aimed at preventing OOM, over-counting is safer than under-counting, so this isn't critical, but it does make the estimate less accurate than intended.
| // retry chain, etc.) | |
| res += sizeof(segment_chunks); | |
| // retry chain, etc.). Account approximately for per-chunk overhead. |
CI test resultstest results on build#81488
|
|
|
|
There is a small chance that this new memory accounting logic limits concurrency so the test naturally becomes slower. But this is an old test and I don't think it stresses memory bad enough. |
Previously the per-segment memory estimate used was
sizeof(remote_segment) + 4KB. This significantly underestimated actual usage. The biggest missing contributor was the hydration wait lists in both the segment and chunks. Each can cache one free chunk(~20KB) even when empty. Assuming that there is likely to be at least one segment wait list and one chunked wait list this results in ~40KB of memory usage unaccounted for. In workloads with many small segments this allowed RP to materialize far more segments than memory allowed resulting in an OOM.Backports Required
Release Notes