Skip to content

[TRTLLM-9526][feat] optimize host perf for python cache transceiver#12273

Merged
chuangz0 merged 6 commits intoNVIDIA:mainfrom
chuangz0:py_cache_transceiver_host_optimize
Apr 1, 2026
Merged

[TRTLLM-9526][feat] optimize host perf for python cache transceiver#12273
chuangz0 merged 6 commits intoNVIDIA:mainfrom
chuangz0:py_cache_transceiver_host_optimize

Conversation

@chuangz0
Copy link
Copy Markdown
Collaborator

@chuangz0 chuangz0 commented Mar 17, 2026

Summary by CodeRabbit

Release Notes

  • New Features
    • Added NumPy array support for memory descriptor construction in cache transmission
    • Enhanced memory pointer and block ID handling for improved cache transfer efficiency in disaggregated operations

Description

with this PR and #12490 .

optimize host perf for python kv transfer

model and config:

  • Model: Qwen3-235B-A22B-FP4
  • Config: ISL=8192, OSL=1024, ctx2_tep4_gen1_dep16
    opt1 (4e505e2)
    opt2 (b10c86d)
KV transfer perf
Metric Stat Baseline Opt1 Opt2 Opt1 vs B Opt2 vs B Opt2 vs Opt1
prepare_args (ms) avg 10.331 0.916 0.424 -91.1% -95.9% -53.7%
median 8.110 0.815 0.385 -90.0% -95.3% -52.8%
p99 20.995 1.887 0.866 -91.0% -95.9% -54.1%
queue (ms) avg 16.813 20.932 14.830 +24.5% -11.8% -29.1%
p99 54.054 383.160 47.484 +608.9% -12.2% -87.6%
transfer (ms) median 10.809 10.632 10.668 ~flat ~flat ~flat
task (ms) avg 46.367 35.590 29.044 -23.2% -37.4% -18.4%
median 39.865 23.645 22.383 -40.7% -43.9% -5.3%
p99 82.510 394.621 59.794 +378.3% -27.5% -84.8%
throughput (MB/s) median 17,392 17,683 17,623 ~flat ~flat ~flat
E2E
Concurrency = 1127
Metric Baseline Opt1 Opt2 Opt1 vs B Opt2 vs B Opt2 vs Opt1
Req Throughput (req/s) 15.12 15.96 15.80 +5.5% +4.5% -1.0%
Output Throughput (tok/s) 15487.7 16343.4 16184.0 +5.5% +4.5% -1.0%
Out Thpt/GPU (tok/s) 645.3 681.0 674.3 +5.5% +4.5% -1.0%
tput_per_user (tok/s) 62.17 61.71 60.68 -0.7% -2.4% -1.7%
Mean TTFT (ms) 53189.7 49926.0 50262.5 -6.1% -5.5% +0.7%
Median TTFT (ms) 56917.3 51837.0 52309.1 -8.9% -8.1% +0.9%
P99 TTFT (ms) 69361.5 65017.9 65715.0 -6.3% -5.3% +1.1%
Mean TPOT (ms) 16.085 16.205 16.480 +0.7% +2.5% +1.7%
Median TPOT (ms) 16.204 16.261 16.272 +0.4% +0.4% +0.1%
P99 TPOT (ms) 17.961 16.750 22.793 -6.7% +26.9% +36.1%
Mean ITL (ms) 316.4 318.8 324.2 +0.7% +2.5% +1.7%
Mean E2EL (ms) 69644.5 66504.0 67121.9 -4.5% -3.6% +0.9%
Median E2EL (ms) 73577.8 68473.5 68936.0 -6.9% -6.3% +0.7%
P99 E2EL (ms) 85457.6 81946.3 82993.1 -4.1% -2.9% +1.3%
Duration (s) 596.1 564.9 570.5 -5.2% -4.3% +1.0%
Concurrency = 1229
Metric Baseline Opt1 Opt2 Opt1 vs B Opt2 vs B Opt2 vs Opt1
Req Throughput (req/s) 15.29 15.98 15.89 +4.5% +3.9% -0.6%
Output Throughput (tok/s) 15657.6 16367.0 16272.1 +4.5% +3.9% -0.6%
Out Thpt/GPU (tok/s) 652.4 682.0 678.0 +4.5% +3.9% -0.6%
tput_per_user (tok/s) 65.95 61.97 61.74 -6.0% -6.4% -0.4%
Mean TTFT (ms) 59627.4 55921.3 56130.1 -6.2% -5.9% +0.4%
Median TTFT (ms) 60737.4 58095.9 58289.0 -4.3% -4.0% +0.3%
P99 TTFT (ms) 74162.3 71548.6 71898.7 -3.5% -3.1% +0.5%
Mean TPOT (ms) 15.164 16.137 16.196 +6.4% +6.8% +0.4%
Median TPOT (ms) 15.024 16.265 16.263 +8.3% +8.2% -0.0%
P99 TPOT (ms) 16.340 16.471 16.393 +0.8% +0.3% -0.5%
Mean ITL (ms) 298.3 317.5 318.6 +6.4% +6.8% +0.4%
Mean E2EL (ms) 75140.2 72429.7 72698.3 -3.6% -3.2% +0.4%
Median E2EL (ms) 76071.8 74734.0 74940.4 -1.8% -1.5% +0.3%
P99 E2EL (ms) 89549.2 88584.7 88575.5 -1.1% -1.1% -0.0%
Duration (s) 643.0 615.1 618.7 -4.3% -3.8% +0.6%

If KV transmission speeds up, the batch size on the gen side will increase faster, the number of iterations will decrease slightly, but TPOT will increase.

config 2 deepseek

deepseek R1 ctx4_dep4_gen1_dep8
8k1k

Baseline is py_cache transceiver without host optimization

1. Output Token Throughput (tok/s)
Concurrency Baseline Host Opt Delta
1024 40,707 40,644 -0.16%
1076 40,629 40,950 +0.79%
1127 40,947 40,978 +0.08%
1229 41,068 41,013 -0.13%

Output throughput is essentially identical between the two configurations (within +/-0.8% noise).

2. User Token Throughput (tok/s)
Concurrency Baseline Host Opt Delta
1024 41.34 41.19 -0.36%
1076 39.20 39.54 +0.87%
1127 37.83 37.85 +0.05%
1229 34.95 34.92 -0.09%
kv transfer
Metric Baseline Host Opt Delta
KVSendTask latency mean 2.52 ms 2.36 ms -6.5%
KVSendTask latency p99 5.68 ms 4.69 ms -17.4%
KVRecvTask latency mean 3.16 ms 2.94 ms -7.0%
KVRecvTask latency p99 6.52 ms 5.44 ms -16.5%
prepare_args latency mean 0.078 ms 0.055 ms -29.5%
Transfer throughput mean 171,233 MB/s 178,121 MB/s +4.0%

config 3 deepseek

deepseek R1 ctx4_dep4_gen1_dep8
8k1k

Baseline is cpp cache transceiver

1. Output Token Throughput (tok/s)
Concurrency Baseline (C++) Py+HostOpt Delta
1024 40,323 40,644 +0.80%
1076 40,467 40,950 +1.19%
1127 40,536 40,978 +1.09%
1229 40,699 41,013 +0.77%
2. User Token Throughput (tok/s)
Concurrency Baseline (C++) Py+HostOpt Delta
1024 40.87 41.19 +0.78%
1076 39.04 39.54 +1.28%
1127 37.33 37.85 +1.39%
1229 34.31 34.92 +1.78%
3. TTFT(ms)
Concurrency Baseline (C++) Py+HostOpt Delta
mean
1024 2,650 2,507 -5.4%
1076 3,730 3,377 -9.5%
1127 4,781 4,405 -7.9%
1229 6,891 6,586 -4.4%
median
1024 1,650 1,496 -9.3%
1076 3,031 2,819 -7.0%
1127 4,118 4,074 -1.1%
1229 6,322 6,801 +7.6%
p99
1024 20,572 20,570 ~0%
1076 21,671 21,509 -0.7%
1127 22,705 22,493 -0.9%
1229 24,674 24,378 -1.2%

C++ cache transceiver has better perf, but get worse context forward perf.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@chuangz0 chuangz0 requested a review from a team as a code owner March 17, 2026 06:44
@chuangz0 chuangz0 requested review from lfr-0531 and yilin-void March 17, 2026 06:44
@chuangz0 chuangz0 requested a review from Shixiaowei02 March 17, 2026 06:44
@chuangz0 chuangz0 force-pushed the py_cache_transceiver_host_optimize branch from 46e84f8 to a271356 Compare March 17, 2026 06:44
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 17, 2026

📝 Walkthrough

Walkthrough

This pull request systematically converts Python list-based data structures to NumPy arrays throughout the disaggregated cache transmission system, including updates to C++ bindings, base classes, implementations, serialization logic, and tests to support vectorized operations.

Changes

Cohort / File(s) Summary
C++ Bindings
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindings.cpp
Added __init__ overload for MemoryDescs to construct from NumPy arrays (addrs, sizes, device_ids) with included nanobind ndarray header support.
Base Class Refactoring
tensorrt_llm/_torch/disaggregation/base/agent.py, tensorrt_llm/_torch/disaggregation/base/region.py, tensorrt_llm/_torch/disaggregation/base/transfer.py
Converted MemoryDescs from dataclass to custom class with dual initialization modes; updated MemRegionGroup.ptrs, RegionExtractorBase.extract, and KVSlice.block_ids_per_layer_groups type signatures from lists to NumPy arrays.
Native Auxiliary & Metadata
tensorrt_llm/_torch/disaggregation/native/auxiliary.py, tensorrt_llm/_torch/disaggregation/native/py_cache_transceiver.py
Converted AuxBufferMeta fields and block ID collections to NumPy arrays with serialization/deserialization logic; replaced list comprehensions with np.fromiter and np.asarray operations.
Memory Mapper Vectorization
tensorrt_llm/_torch/disaggregation/native/mixers/attention/peer.py, tensorrt_llm/_torch/disaggregation/native/mixers/ssm/peer.py
Replaced pointer list construction with vectorized NumPy operations (arithmetic, broadcasting, ravel); updated assertions to use .size for array comparisons.
Transfer & Serialization
tensorrt_llm/_torch/disaggregation/native/transfer.py
Comprehensive refactoring of WriteMeta and RecvReqInfo to use NumPy arrays for pointers/sizes; updated serialization via tobytes()/frombuffer(); replaced list aggregation with NumPy concatenation and size-aware filtering.
KV Extractor
tensorrt_llm/_torch/disaggregation/resource/kv_extractor.py
Updated KVRegionExtractorV1.extract signature to accept np.ndarray; replaced list iteration with vectorized boolean masking for pointer computation.
Test Updates
tests/unittest/disaggregated/region/test_block.py, tests/unittest/disaggregated/test_extractor.py, tests/unittest/disaggregated/test_kv_transfer.py, tests/unittest/disaggregated/test_kv_transfer_mp.py
Updated test assertions to use numpy.testing.assert_array_equal; converted region_ids and pointer constructions to NumPy int64 arrays; adjusted type checks for array-based data structures.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning PR description is incomplete and uses template placeholders. Critical sections like 'Description' and 'Test Coverage' lack substantive content. Add a clear summary of changes, explain the optimization rationale, and list specific test cases covering the modifications.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: optimizing host performance for the Python cache transceiver, with proper JIRA ticket reference and feature type.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
tensorrt_llm/_torch/disaggregation/base/agent.py (1)

41-48: Consider adding strict=True to zip() for safety.

The zip() call on line 45 silently truncates if descs_or_addrs, sizes, and device_ids have different lengths. While upstream code typically ensures equal lengths, adding strict=True (Python 3.10+) would catch mismatches early:

💡 Proposed fix
     def __init__(self, type, descs_or_addrs, sizes=None, device_ids=None):
         self.type = type
         if sizes is not None:
             self.descs = [
-                (int(a), int(s), int(d)) for a, s, d in zip(descs_or_addrs, sizes, device_ids)
+                (int(a), int(s), int(d)) for a, s, d in zip(descs_or_addrs, sizes, device_ids, strict=True)
             ]
         else:
             self.descs = descs_or_addrs
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/disaggregation/base/agent.py` around lines 41 - 48, In
the __init__ method of the agent (where self.descs is built), change the zip
call that combines descs_or_addrs, sizes, and device_ids to use zip(...,
strict=True) so mismatched input lengths raise immediately; update the
tuple-building logic around the zip in __init__ (referencing the __init__ method
and self.descs) to use strict=True to catch length mismatches early.
tensorrt_llm/_torch/disaggregation/native/transfer.py (1)

1640-1645: Minor: Consider using .size instead of len() for consistency.

While len() works on NumPy arrays, .size is the more idiomatic NumPy approach and would be consistent with the rest of this PR's changes.

💡 Proposed fix
     def _register_aux_buffer(self):
         aux_meta = self._aux_buffer.meta
-        ptr_num = len(aux_meta.ptrs)
+        ptr_num = aux_meta.ptrs.size
         ptr_descs = []
         for i in range(ptr_num):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/disaggregation/native/transfer.py` around lines 1640 -
1645, Replace the len() call with the NumPy .size property for consistency:
change the assignment to ptr_num so it uses aux_meta.ptrs.size (referencing the
local variable ptr_num and the attribute aux_meta.ptrs in transfer.py) and keep
the rest of the loop building ptr_descs unchanged.
tests/unittest/disaggregated/test_kv_transfer.py (1)

444-473: Return type annotation is outdated.

The function now returns List[np.ndarray] (each element is np.asarray(..., dtype=np.int64)), but the type hint still declares List[List[int]].

📝 Proposed fix for type annotation
 def get_block_ids_per_layer_groups(
     kv_cache_manager, transfer_worker, request_id: int, use_v2: bool, tokens_per_block: int
-) -> List[List[int]]:
+) -> List[np.ndarray]:
     """Get block_ids for each layer group with window_size filtering."""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/disaggregated/test_kv_transfer.py` around lines 444 - 473, The
return type annotation for get_block_ids_per_layer_groups is outdated (it
returns numpy arrays); update its signature to reflect List[np.ndarray] (or
Sequence[np.ndarray] if you prefer immutability) instead of List[List[int]].
Locate get_block_ids_per_layer_groups and change the annotation and any related
docstring/comment to List[np.ndarray]; ensure imports include numpy as np and
typing.List is used consistently with np.ndarray. Verify callers of
get_block_ids_per_layer_groups (e.g., uses of block_ids_per_layer_groups) still
work with numpy arrays and adjust any type checks if necessary.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindings.cpp`:
- Around line 87-108: The lambda __init__ for kvc::MemoryDescs reads n elements
from addrs, sizes, and deviceIds but only uses addrs.shape(0) for n; add
explicit validation at the start of that lambda: compute n = addrs.shape(0) then
assert sizes.shape(0) == n and deviceIds.shape(0) == n and if not, throw a clear
exception (e.g., std::invalid_argument or nb::value_error) indicating mismatched
array lengths; keep the rest of the logic unchanged so you avoid out-of-bounds
reads when constructing descs.

---

Nitpick comments:
In `@tensorrt_llm/_torch/disaggregation/base/agent.py`:
- Around line 41-48: In the __init__ method of the agent (where self.descs is
built), change the zip call that combines descs_or_addrs, sizes, and device_ids
to use zip(..., strict=True) so mismatched input lengths raise immediately;
update the tuple-building logic around the zip in __init__ (referencing the
__init__ method and self.descs) to use strict=True to catch length mismatches
early.

In `@tensorrt_llm/_torch/disaggregation/native/transfer.py`:
- Around line 1640-1645: Replace the len() call with the NumPy .size property
for consistency: change the assignment to ptr_num so it uses aux_meta.ptrs.size
(referencing the local variable ptr_num and the attribute aux_meta.ptrs in
transfer.py) and keep the rest of the loop building ptr_descs unchanged.

In `@tests/unittest/disaggregated/test_kv_transfer.py`:
- Around line 444-473: The return type annotation for
get_block_ids_per_layer_groups is outdated (it returns numpy arrays); update its
signature to reflect List[np.ndarray] (or Sequence[np.ndarray] if you prefer
immutability) instead of List[List[int]]. Locate get_block_ids_per_layer_groups
and change the annotation and any related docstring/comment to List[np.ndarray];
ensure imports include numpy as np and typing.List is used consistently with
np.ndarray. Verify callers of get_block_ids_per_layer_groups (e.g., uses of
block_ids_per_layer_groups) still work with numpy arrays and adjust any type
checks if necessary.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c636e8b5-bfb5-4228-a6c5-d26fe10c0491

📥 Commits

Reviewing files that changed from the base of the PR and between a064a9b and a271356.

📒 Files selected for processing (14)
  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindings.cpp
  • tensorrt_llm/_torch/disaggregation/base/agent.py
  • tensorrt_llm/_torch/disaggregation/base/region.py
  • tensorrt_llm/_torch/disaggregation/base/transfer.py
  • tensorrt_llm/_torch/disaggregation/native/auxiliary.py
  • tensorrt_llm/_torch/disaggregation/native/mixers/attention/peer.py
  • tensorrt_llm/_torch/disaggregation/native/mixers/ssm/peer.py
  • tensorrt_llm/_torch/disaggregation/native/py_cache_transceiver.py
  • tensorrt_llm/_torch/disaggregation/native/transfer.py
  • tensorrt_llm/_torch/disaggregation/resource/kv_extractor.py
  • tests/unittest/disaggregated/region/test_block.py
  • tests/unittest/disaggregated/test_extractor.py
  • tests/unittest/disaggregated/test_kv_transfer.py
  • tests/unittest/disaggregated/test_kv_transfer_mp.py

Comment thread cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindings.cpp Outdated
Comment thread tensorrt_llm/_torch/disaggregation/base/agent.py
@chuangz0
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39355 [ run ] triggered by Bot. Commit: a271356 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39355 [ run ] completed with state SUCCESS. Commit: a271356
/LLM/main/L0_MergeRequest_PR pipeline #30599 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@chuangz0 chuangz0 force-pushed the py_cache_transceiver_host_optimize branch 2 times, most recently from 215aed1 to b10c86d Compare March 18, 2026 10:31
@chuangz0
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39444 [ run ] triggered by Bot. Commit: b10c86d Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39444 [ run ] completed with state SUCCESS. Commit: b10c86d
/LLM/main/L0_MergeRequest_PR pipeline #30671 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@chuangz0 chuangz0 force-pushed the py_cache_transceiver_host_optimize branch from 31748f2 to 83d0a29 Compare March 19, 2026 08:31
@chuangz0
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39575 [ run ] triggered by Bot. Commit: 83d0a29 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39575 [ run ] completed with state SUCCESS. Commit: 83d0a29
/LLM/main/L0_MergeRequest_PR pipeline #30789 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@chuangz0 chuangz0 force-pushed the py_cache_transceiver_host_optimize branch from 83d0a29 to 225c455 Compare March 20, 2026 03:30
@chuangz0
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39678 [ run ] triggered by Bot. Commit: 225c455 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39678 [ run ] completed with state SUCCESS. Commit: 225c455
/LLM/main/L0_MergeRequest_PR pipeline #30879 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@chuangz0 chuangz0 force-pushed the py_cache_transceiver_host_optimize branch from 225c455 to 68aba48 Compare March 20, 2026 06:48
@chuangz0
Copy link
Copy Markdown
Collaborator Author

/bot help

@github-actions
Copy link
Copy Markdown

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental) --high-priority]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

--high-priority (OPTIONAL) : Run the pipeline with high priority. This option is restricted to authorized users only and will route the job to a high-priority queue.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@chuangz0 chuangz0 force-pushed the py_cache_transceiver_host_optimize branch 2 times, most recently from 265b3eb to 93728ee Compare March 23, 2026 06:10
@chuangz0 chuangz0 force-pushed the py_cache_transceiver_host_optimize branch from eb1465f to 7062fcb Compare March 30, 2026 06:09
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40669 [ run ] triggered by Bot. Commit: 7062fcb Link to invocation

@chuangz0
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40675 [ run ] triggered by Bot. Commit: 98d4f41 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40669 [ run ] completed with state ABORTED. Commit: 7062fcb

Link to invocation

@chuangz0
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40679 [ run ] triggered by Bot. Commit: 98d4f41 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40675 [ run ] completed with state ABORTED. Commit: 98d4f41

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40679 [ run ] completed with state SUCCESS. Commit: 98d4f41
/LLM/main/L0_MergeRequest_PR pipeline #31709 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@chuangz0 chuangz0 force-pushed the py_cache_transceiver_host_optimize branch from 98d4f41 to 62ed37c Compare March 31, 2026 03:32
@chuangz0
Copy link
Copy Markdown
Collaborator Author

/bot run --stage-list "A30-PyTorch-1,A30-PyTorch-2, DGX_B200-8_GPUs-PyTorch-1, GB200-8_GPUs-2_Nodes-PyTorch-2"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40841 [ run ] triggered by Bot. Commit: 62ed37c Link to invocation

@chuangz0
Copy link
Copy Markdown
Collaborator Author

/bot run --stage-list "A30-PyTorch-1, A30-PyTorch-2, DGX_B200-8_GPUs-PyTorch-1, GB200-8_GPUs-2_Nodes-PyTorch-2"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40841 [ run ] completed with state SUCCESS. Commit: 62ed37c
/LLM/main/L0_MergeRequest_PR pipeline #31850 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@chuangz0
Copy link
Copy Markdown
Collaborator Author

/bot run --stage-list "A30-PyTorch-1, DGX_B200-8_GPUs-PyTorch-1, GB200-8_GPUs-2_Nodes-PyTorch-2"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40896 [ run ] triggered by Bot. Commit: 62ed37c Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40896 [ run ] completed with state SUCCESS. Commit: 62ed37c
/LLM/main/L0_MergeRequest_PR pipeline #31897 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

chuangz0 added 6 commits April 1, 2026 09:07
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
@chuangz0 chuangz0 force-pushed the py_cache_transceiver_host_optimize branch from 62ed37c to b220b62 Compare April 1, 2026 01:07
@chuangz0
Copy link
Copy Markdown
Collaborator Author

chuangz0 commented Apr 1, 2026

/bot skip --comment "all test has passed"

@chuangz0 chuangz0 enabled auto-merge (squash) April 1, 2026 01:07
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41049 [ skip ] triggered by Bot. Commit: b220b62 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41049 [ skip ] completed with state SUCCESS. Commit: b220b62
Skipping testing for commit b220b62

Link to invocation

@chuangz0 chuangz0 merged commit 7a450b4 into NVIDIA:main Apr 1, 2026
5 checks passed
karen-sy pushed a commit to karen-sy/TensorRT-LLM that referenced this pull request Apr 7, 2026
…VIDIA#12273)

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants