feat(kernel): copy-into-cage Arrow result handoff (KERNEL_FETCH_MODE=copycage)#444
Draft
msrathore-db wants to merge 1 commit into
Draft
feat(kernel): copy-into-cage Arrow result handoff (KERNEL_FETCH_MODE=copycage)#444msrathore-db wants to merge 1 commit into
msrathore-db wants to merge 1 commit into
Conversation
…copycage)
Adds an opt-in result-handoff mode for the kernel path that avoids the
Arrow IPC re-encode. Instead of decoding the kernel's per-batch Arrow
IPC bytes via RecordBatchReader, the kernel hands over each Arrow buffer
as a V8-owned (in-cage) ArrayBuffer with the bytes copied in, plus a
descriptor; the driver rebuilds the RecordBatch via apache-arrow
makeData (KernelArrowImport.ts) and feeds the existing
ArrowResultConverter unchanged.
- KERNEL_FETCH_MODE in {ipc (default), copycage}; double-gated on the
binding exposing fetchNextBatchCopycage AND the schema being supported
(dictionary/union/Large* fall back to IPC).
- Wired on both the sync metadata fetch handle and the async
AsyncResultHandle (the main executeStatement path).
- New importZeroCopyBatch importer + KernelArrowImport unit test.
- ArrowBatch gains an optional pre-decoded `recordBatches` the converter
consumes when present (IPC path unchanged).
Verified byte-identical to the IPC path across all 21 Databricks type
families + edge cases (empty/all-null/empty-array/map/empty-string,
NaN/+-Inf, decimal/interval/variant/deeply-nested), multi-batch 5M
integrity, and concurrency.
NOTE: depends on the kernel napi change adding fetchNextBatchCopycage
(databricks-sql-kernel) — the driver consumes it via the published
@databricks/databricks-sql-kernel-* native package, so this must land
after that kernel change is released and the native dependency bumped.
Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds an opt-in result-handoff mode for the kernel path that avoids the Arrow IPC re-encode/decode:
KERNEL_FETCH_MODE=copycage.Instead of decoding the kernel's per-batch Arrow IPC bytes via
RecordBatchReader, the kernel hands over each Arrow buffer as a V8-owned (in-cage)ArrayBufferwith the bytes copied in, plus a descriptor. The driver rebuilds theRecordBatchvia apache-arrowmakeData(lib/kernel/KernelArrowImport.ts) and feeds the existingArrowResultConverterunchanged.Changes
KERNEL_FETCH_MODE∈{ ipc (default), copycage }, resolved inKernelOperationBackend. Double-gated: only engages when the binding exposesfetchNextBatchCopycageand the schema is supported (dictionary / union / Large* types fall back to IPC for the whole result).AsyncResultHandle(the mainexecuteStatementpath) — verified via native call-counters that the driver actually invokesfetchNextBatchCopycageand does not silently fall back to IPC.importZeroCopyBatchimporter +tests/unit/kernel/KernelArrowImport.test.ts(pinned layout-compat test so an arrow-rs / apache-arrow layout drift fails loudly).ArrowBatchgains an optional pre-decodedrecordBatchesfield that the converter consumes when present; the IPC path is unchanged.Verification
useKernel) across all 21 type families + edge cases, including the empty-array / all-null / empty-string / map cases.tsc+ result/kernel unit suites green.Depends on the kernel napi change adding
fetchNextBatchCopycage(databricks-sql-kernel PR). The driver consumes the kernel via the published@databricks/databricks-sql-kernel-*native package, so this must merge after that kernel change is released and the native dependency /KERNEL_REVis bumped here. Opened as a draft until then.This pull request and its description were written by Isaac.