Skip to content

[None][infra] Record CBTS decision to OpenSearch for CI-health monitoring#15210

Open
crazydemo wants to merge 1 commit into
NVIDIA:mainfrom
crazydemo:cbts-monitor
Open

[None][infra] Record CBTS decision to OpenSearch for CI-health monitoring#15210
crazydemo wants to merge 1 commit into
NVIDIA:mainfrom
crazydemo:cbts-monitor

Conversation

@crazydemo

@crazydemo crazydemo commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • New Features

    • Added telemetry reporting for CBTS decision outcomes, including deferred, fallback, and final selections.
    • Improved test database distribution mechanism between build stages using tarball uploads.
  • Chores

    • Updated CI infrastructure documentation for cross-job data handling.
    • Extended OpenSearch integration to support CBTS decision tracking and monitoring.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…ring

Emit one record per CBTS decision so the team can monitor CI health (fast +
stable) under fast iteration. Built entirely from what CBTS already produces —
no changes to cbts/main.py or blocks.py.

Recorded per /bot run from getCbtsResult (which has the repo + decision),
best-effort under withCredentials so telemetry never blocks CI. Each doc has:
- s_cbts_status: single outcome+mode field — pre_merge / post_merge (CBTS ran
  and narrowed), fallback (engaged but couldn't decide), deferred (a user
  stage-selection flag turned CBTS off)
- hit-stage count + hit stage list (CBTS-selected stages)
- case-level skip rate: entries in the narrowed cbts_test_db/ vs the source
  test-db for the touched YAML stems (0 when no Layer-3 narrowing)
- scope, reason, sanity/perfsanity flags, pipeline context

Stage-level skip rate is intentionally NOT computed: its denominator (the
no-CBTS baseline) is only known later in Groovy launchTestJobs, on an agent
without a repo checkout — out of scope for this minimal-footprint change.

- open_search_db.py: register the cbts_info index with read/write access.
- cbts/tools/report_cbts_decision.py: count cases (narrowed vs source) and
  post the typed-prefix doc; logs and exits 0 on any failure.
- L0_MergeRequest.groovy: record from getCbtsResult on the ran, fallback, and
  user-flag-deferred paths.
- cbts/README.md: sync the cross-job seed section to the current Artifactory
  tarball flow (was describing the removed base64 piggyback).

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
@crazydemo crazydemo requested review from a team as code owners June 10, 2026 08:02
@crazydemo crazydemo requested review from tburt-nv and yiqingy0 June 10, 2026 08:02
@crazydemo crazydemo requested a review from ZhanruiSunCh June 10, 2026 08:02
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds end-to-end telemetry reporting for CBTS decision outcomes. It introduces an OpenSearch project index for CBTS metrics, implements a reporting script that collects test case counts and posts structured documents, wires telemetry reporting into the Jenkins pipeline's CBTS decision flow at three points (deferred, fallback, success), and updates documentation to describe the new reporting tool and cross-job data handoff mechanism.

Changes

CBTS Telemetry Reporting

Layer / File(s) Summary
OpenSearch project index and access control
jenkins/scripts/open_search_db.py
New CBTS_PROJECT_NAME constant is defined and added to read/write access control lists for OpenSearch client authorization.
CBTS decision reporting script implementation
jenkins/scripts/cbts/tools/report_cbts_decision.py
New script counts test cases from YAML files in CBTS and source test-db directories, builds an OpenSearch document with timestamp, environment metadata, CBTS status/scope/reason, and computed test metrics, then posts the document non-blockingly.
Jenkins pipeline CBTS decision reporting
jenkins/L0_MergeRequest.groovy
Pipeline environment is extended with OpenSearch credentials; the CBTS decision flow instruments three outcomes (deferred, fallback, success) by invoking a new _cbtsReportDecision helper that shells out to the reporting script with status, optional reason/decision JSON, and test-db parameters.
Documentation of telemetry tooling and data handoff
jenkins/scripts/cbts/README.md
README documents the new report_cbts_decision.py tool in the file map, describes tarball-based cross-job delivery of the narrowed test-db from L0_MergeRequest to stage agents via Artifactory, and clarifies Layer 3 fallback conditions when tarball operations fail.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is essentially a template with only the checklist checkbox marked. All required sections (Description, Test Coverage) are empty or unfilled, and no explanation of the issue or solution is provided. Fill in the Description section explaining the issue and solution, and the Test Coverage section listing relevant tests that safeguard the changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding CBTS decision recording to OpenSearch for CI-health monitoring. It is specific, concise, and directly related to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
jenkins/scripts/cbts/README.md (1)

80-80: 💤 Low value

Clarify the description of report_cbts_decision.py.

The current phrasing "post the decision (hit-stage count, case-level skip rate, fallback)" lists "fallback" alongside metrics, but "fallback" is actually one of several possible status values (pre_merge, post_merge, fallback, deferred), not a separate metric being posted.

Consider rephrasing for clarity:

    └── report_cbts_decision.py  post CBTS decision (status, hit-stage count, case-level skip rate) to OpenSearch for CI-health monitoring

or

    └── report_cbts_decision.py  post decision status and test metrics to OpenSearch for CI-health monitoring
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@jenkins/scripts/cbts/README.md` at line 80, The README entry for
report_cbts_decision.py incorrectly lists "fallback" as a metric; update the
description to clarify it posts the decision status plus metrics to
OpenSearch—e.g., change the line for report_cbts_decision.py to "post CBTS
decision (status, hit-stage count, case-level skip rate) to OpenSearch for
CI-health monitoring" or "post decision status and test metrics to OpenSearch
for CI-health monitoring" so that report_cbts_decision.py is clearly documented
as sending a status and metrics rather than treating fallback as a metric.
jenkins/L0_MergeRequest.groovy (1)

807-829: 💤 Low value

Shell argument construction is adequate for controlled inputs.

Line 818 removes single quotes from reason to prevent shell injection, but doesn't escape other special characters. Since reason comes only from controlled sources (user flags or CBTS output), this is acceptable. However, for defense-in-depth, consider using proper shell escaping in the future.

🛡️ Optional: More robust shell escaping

If you want to be extra defensive, consider using Groovy's shell escaping or passing arguments more safely:

// Option 1: Use writeFile for reason too, pass as --reason-file
if (reason) {
    pipeline.writeFile(file: "${LLM_ROOT}/cbts_reason.txt", text: reason)
    args += " --reason-file cbts_reason.txt"
}

// Option 2: Use proper shell escaping (if available in your Groovy environment)
// This is more of a forward-looking suggestion
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@jenkins/L0_MergeRequest.groovy` around lines 807 - 829, The
_cbtsReportDecision function currently injects reason into the shell command by
stripping single quotes which is brittle; instead, write the reason to a file
and pass that filename to the Python script (similar to how decisionJson is
handled) to avoid shell-escaping issues: when reason is present, call
pipeline.writeFile(file: "${LLM_ROOT}/cbts_reason.txt", text: reason) and append
an argument like " --reason-file cbts_reason.txt" to args (and update the Python
script flag handling if needed); ensure the existing
writeFile/args/report_cbts_decision.py flow is used so no unescaped reason text
is interpolated into the sh invocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@jenkins/scripts/cbts/tools/report_cbts_decision.py`:
- Around line 68-94: The build_document function currently omits the
sanity/perfsanity flags from the decision payload; extract
decision.get("sanity_required", False) and decision.get("perfsanity_required",
False) near the top of build_document and include them in the returned dict
(e.g., "sanity_required": sanity_required, "perfsanity_required":
perfsanity_required) so documents created by build_document reflect those flags
from the decision JSON.

---

Nitpick comments:
In `@jenkins/L0_MergeRequest.groovy`:
- Around line 807-829: The _cbtsReportDecision function currently injects reason
into the shell command by stripping single quotes which is brittle; instead,
write the reason to a file and pass that filename to the Python script (similar
to how decisionJson is handled) to avoid shell-escaping issues: when reason is
present, call pipeline.writeFile(file: "${LLM_ROOT}/cbts_reason.txt", text:
reason) and append an argument like " --reason-file cbts_reason.txt" to args
(and update the Python script flag handling if needed); ensure the existing
writeFile/args/report_cbts_decision.py flow is used so no unescaped reason text
is interpolated into the sh invocation.

In `@jenkins/scripts/cbts/README.md`:
- Line 80: The README entry for report_cbts_decision.py incorrectly lists
"fallback" as a metric; update the description to clarify it posts the decision
status plus metrics to OpenSearch—e.g., change the line for
report_cbts_decision.py to "post CBTS decision (status, hit-stage count,
case-level skip rate) to OpenSearch for CI-health monitoring" or "post decision
status and test metrics to OpenSearch for CI-health monitoring" so that
report_cbts_decision.py is clearly documented as sending a status and metrics
rather than treating fallback as a metric.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5530abf2-a60d-4ac5-ae91-db3ecb8463a3

📥 Commits

Reviewing files that changed from the base of the PR and between b206f68 and 0b2ced1.

📒 Files selected for processing (4)
  • jenkins/L0_MergeRequest.groovy
  • jenkins/scripts/cbts/README.md
  • jenkins/scripts/cbts/tools/report_cbts_decision.py
  • jenkins/scripts/open_search_db.py

Comment on lines +68 to +94
def build_document(
decision: dict, status: str, reason: str, cbts_cases: int, source_cases: int
) -> dict:
"""Build the typed OpenSearch doc (field prefixes: s_=str, l_=int, d_=float, flat_=dict)."""
scope = decision.get("scope")
affected = sorted(decision.get("affected_stages") or [])
# deferred has no decision; fall back to --reason.
if not reason:
reason = " | ".join(decision.get("reasons") or [])

case_skipped = max(source_cases - cbts_cases, 0)
case_skip_rate = (case_skipped / source_cases) if source_cases else 0.0

return {
"@timestamp": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%S.%fZ"),
"s_commit": os.getenv("gitlabCommit", ""),
"s_pr_number": os.getenv("gitlabMergeRequestIid", ""),
"s_build_url": os.getenv("BUILD_URL", ""),
"s_cbts_status": status,
"s_scope": str(scope) if scope is not None else "",
"s_reason": reason,
"l_hit_stages": len(affected),
"l_case_source_cases": source_cases,
"l_case_cbts_cases": cbts_cases,
"d_case_skip_rate": round(case_skip_rate, 4),
"flat_detail": {"hit_stages": affected},
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Consider adding sanity/perfsanity flags to the document.

The PR objectives state that each document should include "sanity/perfsanity flags," but the function doesn't extract sanity_required or perfsanity_required from the decision dict. These fields should be available when a decision JSON is provided (as parsed from the CBTS output on line 108).

Consider adding these fields to the document:

📊 Proposed enhancement to capture sanity flags
     case_skipped = max(source_cases - cbts_cases, 0)
     case_skip_rate = (case_skipped / source_cases) if source_cases else 0.0
 
     return {
         "`@timestamp`": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%S.%fZ"),
         "s_commit": os.getenv("gitlabCommit", ""),
         "s_pr_number": os.getenv("gitlabMergeRequestIid", ""),
         "s_build_url": os.getenv("BUILD_URL", ""),
         "s_cbts_status": status,
         "s_scope": str(scope) if scope is not None else "",
         "s_reason": reason,
         "l_hit_stages": len(affected),
         "l_case_source_cases": source_cases,
         "l_case_cbts_cases": cbts_cases,
         "d_case_skip_rate": round(case_skip_rate, 4),
+        "b_sanity_required": decision.get("sanity_required", True),
+        "b_perfsanity_required": decision.get("perfsanity_required", True),
         "flat_detail": {"hit_stages": affected},
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@jenkins/scripts/cbts/tools/report_cbts_decision.py` around lines 68 - 94, The
build_document function currently omits the sanity/perfsanity flags from the
decision payload; extract decision.get("sanity_required", False) and
decision.get("perfsanity_required", False) near the top of build_document and
include them in the returned dict (e.g., "sanity_required": sanity_required,
"perfsanity_required": perfsanity_required) so documents created by
build_document reflect those flags from the decision JSON.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant