Skip to content

Comments

feat: zero-copy columnar conversion for ArrowColumnVector-backed batches#3520

Draft
tokoko wants to merge 1 commit intoapache:mainfrom
tokoko:spark-zero-copy
Draft

feat: zero-copy columnar conversion for ArrowColumnVector-backed batches#3520
tokoko wants to merge 1 commit intoapache:mainfrom
tokoko:spark-zero-copy

Conversation

@tokoko
Copy link
Contributor

@tokoko tokoko commented Feb 14, 2026

Closes #3518

What changes are included in this PR?

  • Introduces a new tryZeroCopyConvert method in CometArrowConverters which receives ColumarBatch of any type and returns ColumnarBatch of CometVector objects if the input is composed of ArrowColumnVector objects, returns None otherwise.
  • Columnar conversion path in CometSparkToColumnarExec always tries tryZeroCopyConvert first and falls back to current flow if zero-copy conversion is impossible.
  • The implementation ignores batchSize configuration as it would be a lot more involved to do that with zero-copy... and I think zero-copy is more important in this case, especially if you assume that whatever operator produces the input will also have some similar configuration. Happy to change the implementation if you disagree though.

How are these changes tested?

  • added tests that test conversion of hand-crafted ColumnarBatch objects as there's no out-of-box data source in spark that produces ColumnarBatch of ArrowColumnVector objects.

@tokoko tokoko marked this pull request as draft February 15, 2026 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ZeroCopy Conversion from Spark ColumnarBatch

1 participant