Improve DLPack support for external tensor consumption by JanuszL · Pull Request #6261 · NVIDIA/DALI

JanuszL · 2026-03-18T10:59:25Z

Adds C++ bulk DLPack constructor for TensorListGPU: accepts a Python
list of DLPack-compatible objects (e.g. PyTorch GPU tensors) and
builds the TensorList in a single pass, recording a CUDA event on
the provided stream. Passes stream as a keyword argument to
__dlpack__() for compatibility with NumPy ≥ 1.22 and JAX which
define def __dlpack__(self, *, stream=None).
Adds native DALI fast path in Batch.init: when given a list of
already-evaluated ndd.Tensor objects, pass their _storage objects
directly to TensorListGPU/CPU constructors, preserving all DALI
metadata (layout, enum types) without going through DLPack.
Adds GPU DLPack fast path in Batch.init: when given a list of
external GPU tensors (e.g. PyTorch) that support DLPack, use the
new C++ bulk constructor to avoid per-tensor Python overhead.
Adds DLPack fallback for CPU read-only arrays in Tensor.init:
catch BufferError from dlpack() and fall back to array
interface. This makes the following work:

arr = np.array([1, 2, 3])
arr.flags.writeable = False
ndd.as_tensor(arr) # previously raised BufferError
Fixes pinned-memory mismatch in the cross-GPU batch slow path:
TensorList::VerifySampleShareCompatibility no longer enforces
is_pinned() uniformity for GPUBackend (only CPUBackend). DALI's
copy operator routes cross-GPU transfers through a pinned staging
buffer, producing a TensorGPU with is_pinned()==true. Combining
such a tensor with non-pinned samples in a non-contiguous
TensorListGPU previously raised "Sample must have the same pinned
status as target batch", causing as_batch() to fail for mixed-GPU
inputs. For GPU, each sample in a non-contiguous TensorList owns
its own allocation, so mixed pinned/non-pinned is safe.

Category:

Bug fix (non-breaking change which fixes an issue)

Description:

Adds C++ bulk DLPack constructor for TensorListGPU: accepts a Python
list of DLPack-compatible objects (e.g. PyTorch GPU tensors) and
builds the TensorList in a single pass, recording a CUDA event on
the provided stream. Passes stream as a keyword argument to
__dlpack__() for compatibility with NumPy ≥ 1.22 and JAX which
define def __dlpack__(self, *, stream=None).
Adds native DALI fast path in Batch.init: when given a list of
already-evaluated ndd.Tensor objects, pass their _storage objects
directly to TensorListGPU/CPU constructors, preserving all DALI
metadata (layout, enum types) without going through DLPack.
Adds GPU DLPack fast path in Batch.init: when given a list of
external GPU tensors (e.g. PyTorch) that support DLPack, use the
new C++ bulk constructor to avoid per-tensor Python overhead.
Adds DLPack fallback for CPU read-only arrays in Tensor.init:
catch BufferError from dlpack() and fall back to array
interface. This makes the following work:

arr = np.array([1, 2, 3])
arr.flags.writeable = False
ndd.as_tensor(arr) # previously raised BufferError
Fixes pinned-memory mismatch in the cross-GPU batch slow path:
TensorList::VerifySampleShareCompatibility no longer enforces
is_pinned() uniformity for GPUBackend (only CPUBackend). DALI's
copy operator routes cross-GPU transfers through a pinned staging
buffer, producing a TensorGPU with is_pinned()==true. Combining
such a tensor with non-pinned samples in a non-contiguous
TensorListGPU previously raised "Sample must have the same pinned
status as target batch", causing as_batch() to fail for mixed-GPU
inputs. For GPU, each sample in a non-contiguous TensorList owns
its own allocation, so mixed pinned/non-pinned is safe.

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: DALI-4580

JanuszL · 2026-03-18T11:02:07Z

!build

dali-automaton · 2026-03-18T11:05:24Z

CI MESSAGE: [46421794]: BUILD STARTED

greptile-apps · 2026-03-18T11:06:16Z

Greptile Summary

This PR improves DLPack support across DALI's dynamic mode by adding C++ bulk constructors (TensorListGPU/CPU.from_dlpack_list), three fast paths in Batch.__init__ (native DALI storage, GPU DLPack, CPU DLPack), a BufferError fallback for read-only CPU arrays in Tensor.__init__, and a pinned-memory check relaxation for GPUBackend in VerifySampleShareCompatibility.

P1 — The GPU DLPack fast path in _batch.py calls TensorListGPU.from_dlpack_list (consuming all capsules) before checking whether the resulting storage dtype matches the requested dtype. If there is a mismatch, fast_path_used stays False and the slow path re-calls __dlpack__() on already-consumed capsules, violating the DLPack single-use contract. The test at line 777 documents the intended behaviour ("fast path is skipped") but the guard is missing.

Confidence Score: 3/5

Safe to merge after fixing the GPU DLPack dtype-check ordering; all other paths are correctly implemented.

One confirmed P1 bug: GPU DLPack capsules are consumed by from_dlpack_list before the dtype compatibility check, so a dtype mismatch silently falls through to the slow path with exhausted capsules. This relies on PyTorch-specific behaviour and breaks for any strict DLPack producer. Score is 3 (below the P1 ceiling of 4) because the affected code path is exercised by the new test suite but the test itself relies on the same PyTorch leniency that hides the bug.

dali/python/nvidia/dali/experimental/dynamic/_batch.py — GPU DLPack fast path dtype-guard ordering (lines 534–553)

Important Files Changed

Filename	Overview
dali/pipeline/data/tensor_list.cc	Removes pinned-memory uniformity check for GPUBackend in VerifySampleShareCompatibility; check retained for CPUBackend. Correct fix for mixed-pinned cross-GPU batches.
dali/python/backend_impl.cc	Adds TensorListFromListOfDLPackObjects (GPU) and TensorListFromListOfDLPackObjectsCPU; fixes set_order in TensorListFromListOfTensors; adds device-ID consistency check; set_order is now correctly called on both contiguous and non-contiguous GPU output paths.
dali/python/nvidia/dali/experimental/dynamic/_batch.py	Adds three fast paths (native DALI, GPU DLPack, CPU DLPack) to Batch.init; GPU DLPack fast path consumes capsules before checking dtype match, risking use-after-consume in the slow-path fallback.
dali/python/nvidia/dali/experimental/dynamic/_tensor.py	Adds BufferError fallback for read-only CPU arrays; adds TypeError retry without stream keyword for older GPU DLPack producers; correctly separates capsule acquisition from TensorGPU construction to avoid double-consume.
dali/util/user_stream.h	GetStream(size_t dev) moved earlier in the file for forward-declaration purposes; no behavior change.
dali/test/python/experimental_mode/test_batch.py	Adds 10 new test cases covering native DALI, GPU DLPack, CPU DLPack fast paths, dtype fallback, and multi-GPU error paths.
dali/test/python/experimental_mode/test_tensor.py	Adds tests for read-only numpy scalars/arrays and GPU cuda_array_interface fallback when dlpack raises TypeError.
qa/TL0_multigpu/test_body.sh	Adds multi-GPU + cupy test execution to the TL0 QA script.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Batch.__init__(tensors=list)"] --> B{list empty?}
    B -- yes --> SLOW
    B -- no --> C{First elem is ndd.Tensor with _storage?}
    C -- yes --> D["Native DALI fast path"]
    D --> D1{Same type and device?}
    D1 -- yes --> D2["fast_path_used=True"]
    D1 -- no --> E
    C -- no --> E{Has __dlpack_device__?}
    E -- yes --> F{dev_type == GPU?}
    F -- yes --> G["GPU DLPack: from_dlpack_list — capsules consumed"]
    G --> G1{"dtype match? checked AFTER consume"}
    G1 -- yes --> G2["fast_path_used=True"]
    G1 -- no --> SLOW
    F -- no --> H{dev_type == CPU?}
    H -- yes --> I["CPU DLPack: from_dlpack_list"]
    I --> I1{success?}
    I1 -- yes --> I2["fast_path_used=True"]
    I1 -- no --> SLOW
    E -- no --> SLOW
    H -- no --> SLOW
    SLOW["Slow path: per-sample Tensor loop"]
    SLOW --> J["Result: Batch"]
    D2 --> J
    G2 --> J
    I2 --> J

Comments Outside Diff (1)

dali/python/nvidia/dali/experimental/dynamic/_batch.py, line 534-553 (link)

GPU DLPack capsules consumed before dtype mismatch is detected

TensorListGPU.from_dlpack_list iterates through all elements and consumes every DLPack capsule before returning. The dtype compatibility check runs after that C++ call returns (line 544). If the storage dtype doesn't match the requested dtype, fast_path_used stays False and the slow path re-calls __dlpack__() on every tensor — but the capsules are already consumed. This works for PyTorch (which creates a new capsule per call) but violates the DLPack single-use guarantee and will silently fail or crash for any producer that enforces it.

The test comment at line 777 even says "When dtype is specified the fast path is skipped", which is the intent — but the code doesn't actually skip it.

The simplest fix is to gate the DLPack fast path on dtype is None, since type conversion requires the slow path anyway:
```
if not fast_path_used and (
    dtype is None  # skip if conversion needed — can't peek dtype without consuming capsule
    and len(tensors_list) > 0
    and not isinstance(tensors_list[0], Tensor)
    and hasattr(tensors_list[0], "__dlpack_device__")
):
```

_{Reviews (30): Last reviewed commit: "Fix" | Re-trigger Greptile}

JanuszL · 2026-03-18T11:56:51Z

@greptileai - can you re-review?

JanuszL · 2026-03-18T12:12:28Z

!build

dali-automaton · 2026-03-18T12:15:20Z

CI MESSAGE: [46425910]: BUILD STARTED

JanuszL · 2026-03-18T13:40:08Z

@greptileai - can you rereview?

dali-automaton · 2026-03-18T14:56:53Z

CI MESSAGE: [46425910]: BUILD PASSED

JanuszL · 2026-03-18T15:13:29Z

!build

dali-automaton · 2026-03-18T15:15:28Z

CI MESSAGE: [46436927]: BUILD STARTED

JanuszL · 2026-03-18T17:07:41Z

@greptileai - can you rereview?

dali-automaton · 2026-03-18T17:28:03Z

CI MESSAGE: [46436927]: BUILD PASSED

JanuszL · 2026-04-20T10:31:02Z

!build

dali-automaton · 2026-04-20T10:35:25Z

CI MESSAGE: [48976609]: BUILD STARTED

dali-automaton · 2026-04-20T12:59:18Z

CI MESSAGE: [48976609]: BUILD FAILED

JanuszL · 2026-04-20T18:29:34Z

!build

dali-automaton · 2026-04-20T18:35:33Z

CI MESSAGE: [49016406]: BUILD STARTED

dali-automaton · 2026-04-20T20:51:06Z

CI MESSAGE: [49016406]: BUILD PASSED

- Adds C++ bulk DLPack constructor for TensorListGPU: accepts a Python list of DLPack-compatible objects (e.g. PyTorch GPU tensors) and builds the TensorList in a single pass, recording a CUDA event on the provided stream. Passes `stream` as a keyword argument to `__dlpack__()` for compatibility with NumPy ≥ 1.22 and JAX which define `def __dlpack__(self, *, stream=None)`. - Adds native DALI fast path in Batch.__init__: when given a list of already-evaluated ndd.Tensor objects, pass their _storage objects directly to TensorListGPU/CPU constructors, preserving all DALI metadata (layout, enum types) without going through DLPack. - Adds GPU DLPack fast path in Batch.__init__: when given a list of external GPU tensors (e.g. PyTorch) that support DLPack, use the new C++ bulk constructor to avoid per-tensor Python overhead. - Adds DLPack fallback for CPU read-only arrays in Tensor.__init__: catch BufferError from __dlpack__() and fall back to __array__ interface. This makes the following work: arr = np.array([1, 2, 3]) arr.flags.writeable = False ndd.as_tensor(arr) # previously raised BufferError Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

- Rename TensorListGPU DLPack list constructor to from_dlpack_list static method to avoid ambiguity with the list-of-TensorGPU constructor - Add TensorListFromListOfDLPackObjectsCPU and TensorListCPU.from_dlpack_list for CPU DLPack tensors; extend _batch.py fast path to kDLCPU - Fix stream_handle assignment to only fall back to raw stream object when it is an integer, avoiding forwarding non-integer stream wrappers to __dlpack__ - Add GPU DLPack fallback to __cuda_array_interface__ in _tensor.py when __dlpack__ raises BufferError or TypeError - Drop underscore prefix from all fast-path locals; rename _backend_type to BackendType and _stream to cuda_stream - Remove overly conservative dtype is None guard from both fast-path outer conditions; check dtype per-tensor in the native loop, post-call in DLPack - Add per-tensor GPU device-ID consistency check inside the native fast-path loop using first_dev_id; break early on device mismatch - Replace _native_valid boolean flag with for-else idiom Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Catch ValueError (mixed-device C++ error from from_dlpack_list) and BufferError (failed __dlpack__ handshake) in addition to TypeError, so both GPU and CPU DLPack fast paths fall back to per-sample Tensor construction instead of propagating errors to the caller. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

@attr

- Native fast path: matching dtype keeps fast path; mismatched dtype converts via slow path - GPU DLPack BufferError in batch-level from_dlpack_list falls back to slow path using __cuda_array_interface__ per tensor - GPU Tensor construction falls back to __cuda_array_interface__ when __dlpack__ raises BufferError - Mixed-GPU DLPack batch: ValueError from from_dlpack_list falls back to slow path (tagged multi_gpu, skipped when < 2 GPUs) - Native fast path mixed-GPU: device-ID mismatch check triggers slow path fallback (tagged multi_gpu, skipped when < 2 GPUs) Multi-GPU tests are tagged @attr("multi_gpu") / @attr("pytorch","multi_gpu") so they are picked up by TL0_multigpu/test_body.sh without shell changes. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Stream handshake (blocking): Derive the DLPack consumer-stream handle from copy_order instead of inspecting the Python stream object's .handle attribute. copy_order is computed via AccessOrderFromPythonStreamObj() which handles all known stream wrapper types; reusing it for the __dlpack__(stream=...) argument guarantees the producer always synchronises with the exact stream DALI will use for the copy. Read-only CPU buffer (blocking): When __dlpack__() raises BufferError on a CPU tensor (e.g. read-only NumPy array), the fallback now copies the data via np.array(..., copy=True) instead of zero-copying. This prevents DALI from holding a mutable alias of memory the producer explicitly marked as immutable. TypeError (older DLPack, unsupported dtype) still falls back without copying. GPU DLPack CAI fallback (medium): The GPU Tensor constructor no longer catches BufferError before trying __cuda_array_interface__. A BufferError on a GPU DLPack call may signal a synchronisation or ownership constraint; silently switching to CAI (which has no consumer-stream handshake) could cause stale reads. Only TypeError (protocol mismatch, safe to fall back) is caught. Update tests: _BrokenDlpack -> _TypeErrorDlpack (raises TypeError) to remain consistent with the tightened GPU except clause. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

…allback - Add set_order(copy_order) to both non-contiguous and contiguous return paths of TensorListFromListOfDLPackObjects so that AccessOrder metadata is propagated identically to the single-tensor path - Retry __dlpack__() without stream kwarg before falling back to CAI; a TypeError from the stream-aware call means the producer refused the handoff, not that DLPack is unavailable entirely Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

… and device check - Add a preflight loop that calls __dlpack_device__() for every object before any __dlpack__() call: validates device homogeneity without consuming the capsule, and resolves the UserStream for the batch device (via UserStream::Get()->GetStream(device_id)) so stream_handle is correct for all tensors including the first one - When __dlpack_device__ is absent, fall back to the per-tensor UserStream after i==0 (existing behavior) and recompute stream_handle so tensors 1..N still receive the correct consumer-stream handle - Remove the redundant per-element __dlpack__ device check (now covered by the preflight); retain the post-FillTensorFromDlPack check as a safety net for objects that lack __dlpack_device__ Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

TensorList::VerifySampleShareCompatibility enforced that every sample shared into a TensorList must have the same is_pinned() value as the batch. For CPU this is correct: host pinned and non-pinned allocations live in different memory spaces and must not be mixed. For GPU the constraint is unnecessarily strict: SetSample is only ever called in non-contiguous mode (it calls MakeNoncontiguous() first), so each sample owns its own independent allocation. A GPU tensor backed by CUDA pinned host memory (is_pinned()==true) and a regular device tensor (is_pinned()==false) can coexist in the same non-contiguous TensorList because downstream per-sample access respects each sample's own is_pinned flag. The strict check caused a regression in the dynamic-API cross-GPU batch slow path: copying a tensor from GPU N to GPU 0 via DALI's copy operator routes through a pinned staging buffer, yielding a TensorGPU with is_pinned()==true. Assembling this together with non-pinned tensors from GPU 0 into a non-contiguous TensorListGPU then failed the pinned uniformity check with "Sample must have the same pinned status as target batch". Fix: guard the pinned check with if constexpr (CPUBackend only), leaving the device_id check intact for both backends. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Two P2 issues found in code review: 1. backend_impl.cc (TensorListFromListOfDLPackObjects): When no input object implements __dlpack_device__, the preflight loop left copy_order == AccessOrder::host() and stream_handle == None. The main loop then called __dlpack__(stream=None) for tensor 0, meaning the DLPack producer was not informed of the DALI consumer stream and could race against DALI's copy if it had in-flight GPU work. Fix: resolve copy_order via cudaGetDevice() after the preflight loop so all __dlpack__() calls — including tensor 0 — receive the same concrete stream handle. Remove the now-dead per-tensor fallback that only fired after tensor 0 was already consumed. 2. _tensor.py (GPU DLPack path in Tensor.__init__): The outer try/except TypeError wrapped both data.__dlpack__(**args) and _backend.TensorGPU(dlpack_capsule, ...). If __dlpack__(**args) succeeded but TensorGPU raised TypeError (e.g. unsupported dtype), the capsule was already consumed; the except block then called data.__dlpack__() a second time. DLPack capsules are single-use, so the second call is undefined behaviour. Fix: separate capsule acquisition from TensorGPU construction. Only the __dlpack__() call itself is inside the outer try; once a capsule is obtained, TensorGPU construction is outside any retry block. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

VerifySampleShareCompatibility no longer enforces is_pinned() uniformity for GPUBackend: pinned (host-accessible) and regular device tensors can coexist in a non-contiguous TensorListGPU because each sample owns its own independent allocation. SetRequiredSetters now conditionally includes the "pinned" entry only for CPUBackend, matching the enforcement in VerifySampleShareCompatibility. GPU TensorList permutation iterations no longer expect an exception when the "pinned" setter is excluded. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Copy(non_contiguous, copy_order) queues GPU work on copy_order, but set_order(copy_order) was never called on the returned TensorList. This left it with AccessOrder::host(), causing downstream consumers to use a host barrier instead of the correct GPU stream, allowing races with the in-flight copy. Mirrors the existing set_order calls in TensorListFromListOfDLPackObjects. Both the contiguous and non-contiguous GPU paths are fixed. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

wait_order was always AccessOrder::host() and never updated, making both copy_order.wait(wait_order) calls inert. Remove the variable and the two dead wait calls, matching the DLPack path which omits them entirely. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Relax FillTensorFromDlPack<CPUBackend> to accept kDLCUDAHost in addition to kDLCPU - the function body already handled the pinned-memory case (is_pinned flag, device_id) but the DALI_ENFORCE blocked it from being reached. Extend the DLPack fast path in _batch.py to match, so pinned CPU tensors (e.g. PyTorch tensor.pin_memory()) use the fast path instead of falling through to the slow path. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Remove BufferError from the GPU DLPack fast-path except clause in _batch.py — consistent with _tensor.py's documented intent that BufferError must not be caught on GPU, and prevents a 'capsule already consumed' error if the slow path is entered after partial consumption. Add a preflight loop to TensorListFromListOfDLPackObjectsCPU mirroring the GPU counterpart: checks __dlpack_device__() on every element before any __dlpack__() call, throwing py::value_error (caught by the Python fast-path except clause) for non-CPU device types instead of leaking a RuntimeError from DALI_ENFORCE after capsules are partially consumed. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL force-pushed the dynamic_as_batch_fast_path branch from 895265b to e432f38 Compare March 18, 2026 11:01

github-advanced-security AI found potential problems Mar 18, 2026

View reviewed changes

Comment thread dali/python/nvidia/dali/experimental/dynamic/_batch.py Fixed

greptile-apps Bot reviewed Mar 18, 2026

View reviewed changes

Comment thread dali/python/nvidia/dali/experimental/dynamic/_batch.py Outdated

Comment thread dali/python/backend_impl.cc

Comment thread dali/python/nvidia/dali/experimental/dynamic/_batch.py Outdated

Comment thread dali/python/nvidia/dali/experimental/dynamic/_tensor.py Outdated

dali-automaton assigned mdabek-nvidia and szalpal Mar 18, 2026

JanuszL marked this pull request as draft March 18, 2026 11:52

JanuszL force-pushed the dynamic_as_batch_fast_path branch from e432f38 to 4b868b5 Compare March 18, 2026 11:56

JanuszL force-pushed the dynamic_as_batch_fast_path branch 2 times, most recently from d1c2614 to c9793e7 Compare March 18, 2026 12:09

JanuszL marked this pull request as ready for review March 18, 2026 12:11

greptile-apps Bot reviewed Mar 18, 2026

View reviewed changes

Comment thread dali/python/backend_impl.cc Outdated

JanuszL force-pushed the dynamic_as_batch_fast_path branch from c9793e7 to 50843aa Compare March 18, 2026 12:34

JanuszL force-pushed the dynamic_as_batch_fast_path branch 3 times, most recently from 61611fe to b7a90c8 Compare March 18, 2026 15:13

github-advanced-security AI found potential problems Mar 18, 2026

View reviewed changes

Comment thread dali/python/nvidia/dali/experimental/dynamic/_batch.py Dismissed

JanuszL added the Dynamic Mode label Apr 17, 2026

rostan-t reviewed Apr 17, 2026

View reviewed changes

JanuszL added 22 commits April 27, 2026 12:36

Fix test name and docstring for CPU torch tensor fast path

1c72c39

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Fix E501 line-too-long in test docstrings

8d8d402

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Assert device_id in mixed-GPU tests to make fallback policy explicit

190f40f

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Make UserStream::GetStream(size_t) public to fix build error

f22f1aa

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Fix missing space in TensorListFromListOfTensors error message

78f67fa

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Fix line length in TensorListFromListOfTensors error message

47775d4

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Fix TensorListFromListOfTensors error message to match test expectations

cf27b39

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Fix

c553469

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL force-pushed the dynamic_as_batch_fast_path branch from 720086b to c553469 Compare April 27, 2026 10:42

mdabek-nvidia approved these changes May 4, 2026

View reviewed changes

Conversation

JanuszL commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Uh oh!

JanuszL commented Mar 18, 2026

Uh oh!

Uh oh!

dali-automaton commented Mar 18, 2026

Uh oh!

greptile-apps Bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JanuszL commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JanuszL commented Mar 18, 2026

Uh oh!

dali-automaton commented Mar 18, 2026

Uh oh!

Uh oh!

JanuszL commented Mar 18, 2026

Uh oh!

dali-automaton commented Mar 18, 2026

Uh oh!

JanuszL commented Mar 18, 2026

Uh oh!

dali-automaton commented Mar 18, 2026

Uh oh!

Uh oh!

JanuszL commented Mar 18, 2026

Uh oh!

dali-automaton commented Mar 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JanuszL commented Apr 20, 2026

Uh oh!

dali-automaton commented Apr 20, 2026

Uh oh!

dali-automaton commented Apr 20, 2026

Uh oh!

JanuszL commented Apr 20, 2026

Uh oh!

dali-automaton commented Apr 20, 2026

Uh oh!

dali-automaton commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

JanuszL commented Mar 18, 2026 •

edited

Loading

greptile-apps Bot commented Mar 18, 2026 •

edited

Loading

JanuszL commented Mar 18, 2026 •

edited

Loading