Skip to content

Java compression codecs do not release compressed ArrowBuf in decompress, causing allocator leaks #1037

@loserwang1024

Description

@loserwang1024

bug

When using Arrow Java’s compression codecs (e.g. ZSTD) from downstream projects, we observed that an OutOfMemoryError: Direct buffer memory during decompression can leave the associated BufferAllocator in a leaked state (its allocated memory never returns to zero after the failing operation).

In our case this surfaced in Apache Fluss (see apache/fluss#2646), but after investigation the root cause appears to be in Arrow Java’s compression codec implementation:

Image

version

18.3.0

Solve

In the original Arrow implementation, the decompression loop runs outside the try-finally block that guards loadFieldBuffers. This means if decompression succeeds for the first N buffers of a field but fails on the (N+1)-th buffer, the already-decompressed buffers in ownBuffers are never closed, leaking Direct Memory.

To fix it, move the decompression loop inside the try block so that the finally clause always closes every buffer in ownBuffers, regardless of whether the load succeeds or fails:

  • Success path: loadFieldBuffers retains each buffer (ref count +1), then the finally close decrements it back (ref count -1). The field vector still holds the buffer.
  • Error path: The finally close decrements each already-decompressed buffer's ref count to 0, immediately freeing the Direct Memory.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions