Skip to content

Support OTLP runtime metrics with OTel-native naming#11318

Open
link04 wants to merge 5 commits intomasterfrom
maximo/otlp-runtime-metrics
Open

Support OTLP runtime metrics with OTel-native naming#11318
link04 wants to merge 5 commits intomasterfrom
maximo/otlp-runtime-metrics

Conversation

@link04
Copy link
Copy Markdown

@link04 link04 commented May 8, 2026

What Does This Do

Adds an OTLP runtime-metrics path that emits JVM runtime metrics with OTel semantic-convention names (jvm.*) through the agent's MeterProvider, instead of the proprietary DogStatsD names (jvm.heap_memory, jvm.thread_count, …).

When the three flags below are set together, JvmOtlpRuntimeMetrics.start() is invoked from Agent.installDatadogTracer() and registers 15 instruments backed by
java.lang.management MXBean callbacks. They flow through the existing OTLP exporter — no new transport, no JMXFetch.

Flag Required value Default
DD_RUNTIME_METRICS_ENABLED true true
DD_METRICS_OTEL_ENABLED true false
DD_METRICS_OTEL_EXPORTER otlp unset

Instruments registered (15 total — Recommended + Development per the OTel JVM semconv):

  • Memoryjvm.memory.used, jvm.memory.committed, jvm.memory.limit, jvm.memory.init, jvm.memory.used_after_last_gc
  • Buffer poolsjvm.buffer.memory.used, jvm.buffer.memory.limit, jvm.buffer.count
  • Threadsjvm.thread.count
  • Class loadingjvm.class.loaded, jvm.class.count, jvm.class.unloaded
  • CPUjvm.cpu.time, jvm.cpu.count, jvm.cpu.recent_utilization

jvm.gc.duration is intentionally deferred. The spec requires a Histogram of per-collection pause durations, but GarbageCollectorMXBean only exposes cumulative collection time. Populating the histogram requires either subscribing to GarbageCollectionNotificationInfo via JMX (blocked by the bootstrap-class-loading constraints in docs/bootstrap_design_guidelines.md) or consuming JFR GarbageCollection events. Tracked as a follow-up.

Related system tests PR enabling tests: DataDog/system-tests#6800

Motivation

Customers running with DD_METRICS_OTEL_EXPORTER=otlp route their telemetry to an OTel collector — there may not be a Datadog Agent on the path, and therefore nothing listening on the DogStatsD socket. Today the tracer's runtime metrics still emit through DogStatsD with proprietary names (jvm.heap_memory, …), so in those deployments runtime metrics silently go nowhere.

This change emits the same runtime metric data as OTLP instruments with OTel semantic-convention names through the OTel MeterProvider, so it travels the same OTLP pipeline the customer already configured. Customers who haven't opted into OTLP metrics see no change — the existing DogStatsD path is untouched.

Additional Notes

  • The DogStatsD runtime-metrics path is unmodified. The two paths can run independently; opting into OTLP doesn't disable DogStatsD.
  • start() is single-shot: an AtomicBoolean CAS guards against re-entry from re-init, and on failure we log and stop (partial registration is worse than a silent
    retry).
  • Uses only java.lang.management.* plus com.sun.management.OperatingSystemMXBean for CPU. CPU instruments are skipped at registration time on JVMs where the com.sun bean isn't present. No javax.management.* is touched, keeping the constraints in docs/bootstrap_design_guidelines.md intact.
  • JvmOtlpRuntimeMetrics is registered in META-INF/native-image/.../reflect-config.json (using its post-shadow-relocation FQN, datadog.trace.bootstrap.otel.shim.metrics.JvmOtlpRuntimeMetrics) so AOT/native-image builds can resolve it reflectively from Agent.java.
  • Tests: JvmOtlpRuntimeMetricsTest (JUnit 5) covers instrument surface, attribute keys (jvm.memory.type=heap|non_heap), positive values for live metrics (jvm.memory.used, jvm.thread.count), and idempotency of repeated start() calls.

@link04 link04 added type: enhancement Enhancements and improvements comp: metrics Metrics inst: opentelemetry OpenTelemetry instrumentation tag: ai generated Largely based on code generated by an AI or LLM labels May 8, 2026
@link04 link04 marked this pull request as ready for review May 9, 2026 11:02
@link04 link04 requested review from a team as code owners May 9, 2026 11:02
@link04 link04 requested review from mcculls and removed request for a team May 9, 2026 11:02
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 953c8710a6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +68 to +72
meter
.upDownCounterBuilder("jvm.memory.used")
.setDescription("Measure of memory used.")
.setUnit("By")
.buildWithCallback(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fix runtime metrics accumulating on every export

When OTLP runtime metrics are enabled (DD_RUNTIME_METRICS_ENABLED=true, DD_METRICS_OTEL_ENABLED=true, DD_METRICS_OTEL_EXPORTER=otlp), these MXBean callbacks record point-in-time values through observable sum instruments. In the current shim, OtelMetricStorage.shouldResetOnCollect leaves OBSERVABLE_UP_DOWN_COUNTER non-resetting and OtelLongSum adds each observation, so every OTLP collection adds the current heap/thread/etc. value to the previous export instead of replacing it; the same pattern affects observable counters like class and CPU time under the existing temporality handling. This makes exported JVM metrics monotonically inflate after the first flush, so the callbacks need value/last-observation semantics or the observable storage needs to handle async sum instruments correctly.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: metrics Metrics inst: opentelemetry OpenTelemetry instrumentation tag: ai generated Largely based on code generated by an AI or LLM type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant