Skip to content

Cache splitOffsets list to avoid repeated allocations#15625

Closed
kuldeep27396 wants to merge 4 commits intoapache:mainfrom
kuldeep27396:fix/splitoffsets-caching
Closed

Cache splitOffsets list to avoid repeated allocations#15625
kuldeep27396 wants to merge 4 commits intoapache:mainfrom
kuldeep27396:fix/splitoffsets-caching

Conversation

@kuldeep27396
Copy link
Copy Markdown

Why this change

Resolves #15622

Problem: BaseFile.splitOffsets() was creating a new List wrapper on every invocation via ArrayUtil.toUnmodifiableLongList(). When file metadata is read and rewritten (e.g., during manifest rewriting or format conversion), each entry needlessly allocates a list that is immediately consumed and discarded.

Solution: Cache the List representation after the first conversion, similar to how other fields like partitionData are handled. The cache is transient so it won't be serialized.

Changes

  • Added private transient List splitOffsetsList field to cache the converted list
  • Modified splitOffsets() to return the cached list instead of creating a new one on each call
  • The cached list is created lazily on first access

Performance Impact

This eliminates unnecessary object allocations when:

  • Reading and rewriting manifest files
  • Format conversions
  • Any scenario where splitOffsets() is called multiple times on the same BaseFile instance

The change is minimal and focused, following the existing pattern used for other cached fields in the class.

BaseFile.splitOffsets() was creating a new List<Long> wrapper on every
invocation via ArrayUtil.toUnmodifiableLongList(). When file metadata is
read and rewritten (e.g., during manifest rewriting or format conversion),
each entry needlessly allocates a list that is immediately consumed and
discarded.

This change caches the List<Long> representation after the first conversion,
similar to how other fields like partitionData are handled. The cache is
transient so it won't be serialized.

Fixes apache#15622
@github-actions github-actions bot added the core label Mar 13, 2026
@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Apr 13, 2026
@github-actions
Copy link
Copy Markdown

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BaseFile.splitOffsets() allocates a new List on every call

1 participant