feat(data-pipeline)!: add stdout log trace exporter#2074
Conversation
📚 Documentation Check Results📦
|
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
🔒 Cargo Deny Results📦
|
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: c71fb77 | Docs | Datadog PR Page | Give us feedback! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2074 +/- ##
==========================================
+ Coverage 73.44% 73.68% +0.24%
==========================================
Files 465 478 +13
Lines 77949 79613 +1664
==========================================
+ Hits 57248 58664 +1416
- Misses 20701 20949 +248
🚀 New features to boost your workflow:
|
ca7c877 to
a090331
Compare
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
a090331 to
cb8bad8
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cb8bad815b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const AGENT_ENDPOINT_ENV_VARS: [&str; 3] = [ | ||
| "DD_AGENT_HOST", | ||
| "DD_TRACE_AGENT_URL", | ||
| "DATADOG_TRACE_AGENT_HOSTNAME", | ||
| ]; |
There was a problem hiding this comment.
Include DD_TRACE_AGENT_PORT in endpoint detection
When this helper is used in Lambda, a deployment that sets only DD_TRACE_AGENT_PORT to point at a non-default local/extension agent is still classified as having no explicit agent endpoint, so recommended_log_output() returns true and callers may switch to stdout instead of the configured agent. Other libdatadog config paths treat DD_TRACE_AGENT_PORT as part of trace-agent endpoint resolution (for example telemetry/crashtracker read it alongside DD_AGENT_HOST and DD_TRACE_AGENT_URL), so this list should include it to avoid misrouting traces in that configuration.
Useful? React with 👍 / 👎.
| fn has_explicit_agent_endpoint() -> bool { | ||
| AGENT_ENDPOINT_ENV_VARS | ||
| .iter() | ||
| .any(|var| matches!(env::var(var), Ok(value) if !value.is_empty())) |
| /// which serializes spans directly to stdout. | ||
| #[cfg(not(target_arch = "wasm32"))] | ||
| pub fn send_trace_chunks<T: TraceData>( | ||
| pub fn send_trace_chunks<T: TraceData + serde::Serialize>( |
There was a problem hiding this comment.
@iunanua (asking you since you've been working on the semver checks recently) - I'm curious why semver checks aren't failing here? Shouldn't this be considered a breaking change?
There was a problem hiding this comment.
It is clearly a breaking change, if the parent modules of this one are public
|
|
||
| /// Create a writer targeting a custom sink (used in tests). | ||
| #[cfg(test)] | ||
| pub(crate) fn with_sink(out: Box<dyn Write + Send>, max_line_size: usize) -> Self { |
There was a problem hiding this comment.
This is currently gated to #[cfg(test)]. Isn't this going to be needed as part of the public API in order for this to work with wasm? I don't think libdatadog can write to stdout when wasm is the target?
And speaking of wasm, should we be leveraging the capabilities crate for the log writer?
| // until the page/module is torn down). | ||
| #[cfg(target_arch = "wasm32")] | ||
| let _ = info_fetcher_handle; | ||
| let _ = &info_fetcher_handle; |
There was a problem hiding this comment.
I feel this is useless, because instead of dropping the info_fetcher_handle, we drop a reference to it, which doesn't do much. Was this change intentional?
| /// flush. On a multi-threaded async runtime this can stall the reactor, so log | ||
| /// mode is intended for serverless / current-thread runtimes (e.g. AWS Lambda). | ||
| pub(crate) struct LogTraceWriter { | ||
| out: Mutex<Box<dyn Write + Send>>, |
There was a problem hiding this comment.
I think stdout() is already using a mutex internally. Since it's Send + Sync and &Stdout: Write, I believe you can just use Stdout here without an outer mutex, if we always send to Stdout.
|
|
||
| /// Create a writer targeting a custom sink (used in tests). | ||
| #[cfg(test)] | ||
| pub(crate) fn with_sink(out: Box<dyn Write + Send>, max_line_size: usize) -> Self { |
There was a problem hiding this comment.
Ah, you need a more generic form for tests... we could define enum Sink { Stdout(Stdout), Other(Mutex<Box<dyn Write + Send>>). Not sure it's worth it, if the mutex is never contended and just here to please the compiler, though.
| /// which serializes spans directly to stdout. | ||
| #[cfg(not(target_arch = "wasm32"))] | ||
| pub fn send_trace_chunks<T: TraceData>( | ||
| pub fn send_trace_chunks<T: TraceData + serde::Serialize>( |
There was a problem hiding this comment.
It is clearly a breaking change, if the parent modules of this one are public
| "Wrote traces to log exporter" | ||
| ); | ||
| return Ok(AgentResponse::Unchanged); | ||
| } |
There was a problem hiding this comment.
nitpick: since this is a totally different codepath that returns early, might be clearer to move that logic into its own synchronous helper (rustc is pretty good at inlining local functions, so this should be free).
There was a problem hiding this comment.
nitpick: since this is a totally different codepath that returns early, might be clearer to move that logic into its own synchronous helper (rustc is pretty good at inlining local functions, so this should be free).
Up this nitpick (although not blocking, I'll let you decide);
|
|
||
| /// Shared in-memory sink for asserting bytes written in log-export mode. | ||
| #[derive(Clone, Default)] | ||
| struct SharedSink(std::sync::Arc<std::sync::Mutex<Vec<u8>>>); |
There was a problem hiding this comment.
I think Arc is imported already? At least in the parent module?
| struct SharedSink(std::sync::Arc<std::sync::Mutex<Vec<u8>>>); | |
| struct SharedSink(Arc<std::sync::Mutex<Vec<u8>>>); |
| /// deliberately not exposed over the FFI: host SDKs typically already have their | ||
| /// own serverless/agent detection and may implement their own recommendation. | ||
| pub fn recommended_log_output() -> bool { | ||
| recommended_log_output_inner(Path::new(DATADOG_EXTENSION_PATH)) |
There was a problem hiding this comment.
Nitpick: should probably inline this function here. It's not like the outer function adds much logic/wrapping.
| if has_type { | ||
| len += 1; | ||
| } | ||
| if !span.meta.is_empty() { | ||
| len += 1; | ||
| } | ||
| if !span.metrics.is_empty() { | ||
| len += 1; | ||
| } | ||
| // `meta_struct` is intentionally NOT emitted: it holds raw msgpack bytes | ||
| // that would serialize as a JSON number array the intake cannot interpret. | ||
| // The reference log exporters (JS/Go/Py/Java) omit it as well. | ||
| if !span.span_links.is_empty() { | ||
| len += 1; | ||
| } | ||
| if !span.span_events.is_empty() { | ||
| len += 1; | ||
| } |
There was a problem hiding this comment.
| if has_type { | |
| len += 1; | |
| } | |
| if !span.meta.is_empty() { | |
| len += 1; | |
| } | |
| if !span.metrics.is_empty() { | |
| len += 1; | |
| } | |
| // `meta_struct` is intentionally NOT emitted: it holds raw msgpack bytes | |
| // that would serialize as a JSON number array the intake cannot interpret. | |
| // The reference log exporters (JS/Go/Py/Java) omit it as well. | |
| if !span.span_links.is_empty() { | |
| len += 1; | |
| } | |
| if !span.span_events.is_empty() { | |
| len += 1; | |
| } | |
| len += has_type as usize; | |
| len += (!span.meta.is_empty()) as usize; | |
| len += (!span.metrics.is_empty()) as usize; | |
| // `meta_struct` is intentionally NOT emitted: it holds raw msgpack bytes | |
| // that would serialize as a JSON number array the intake cannot interpret. | |
| // The reference log exporters (JS/Go/Py/Java) omit it as well. | |
| len += (!span.span_links.is_empty()) as usize; | |
| len += (!span.span_events.is_empty()) as usize; |
cb8bad8 to
bc7d64d
Compare
b7ac109 to
7a39e74
Compare
| /// synchronous/blocking, so this mode targets single-threaded serverless | ||
| /// runtimes where blocking stdout writes do not stall a shared async reactor. | ||
| /// | ||
| /// Takes precedence over an OTLP endpoint: if both this and `set_otlp_endpoint` |
There was a problem hiding this comment.
I think this is non-blocking and can be addressed in a separate PR...
I'm not a fan of the exporter containing precedence logic for configuration. I left a similar comment on @paullegranddc's agentless PR. I believe config logic should live in the SDKs as much as possible, and if the builder receives conflicting config settings, we should return an error. If we want to support precedence to avoid setup friction for customers that should be handled by the SDKs. I don't think there is any harm in merging it as is for now. Let's just capture it in a Jira ticket.
| // The handle is currently only tracked for shutdown on native; on wasm | ||
| // it is dropped here (the worker keeps running on the JS event loop | ||
| // until the page/module is torn down). | ||
| // In log-export mode there is no agent to poll; skip spawning the worker |
There was a problem hiding this comment.
I left a comment about agent info and telemetry workers being unnecessarily spawned in @paullegranddc's agentless PR. Beyond the scope of this PR, but after these two PRs land we should discuss refactoring the builder a little bit to handle this a bit more elegantly.
| assert!(out.is_empty()); | ||
| } | ||
|
|
||
| /// Mirrors the Datadog Forwarder `is_trace` detection contract (spec §6.1): |
There was a problem hiding this comment.
minor: Since these are technically public docs, if we are going to refer to specs in them by section, the spec should be public too. There is an rfc folder in the libdatadog workspace where you can place the spec as markdown and link to. Alternatively, you could just not mention the spec in the rustdoc comments and not worry about it.
There was a problem hiding this comment.
Oh, hold on, this is a test. There shouldn't be rustdoc comments at all. I'd suggest just changing the /// to //. I don't have any issue with the spec language in the comment if it's just a code comment and not a rustdoc comment.
ekump
left a comment
There was a problem hiding this comment.
A couple of minor comments, mostly calling out future work for the team. LGTM.
yannham
left a comment
There was a problem hiding this comment.
I like the log capability, even is slightly more verbose, since we get mocking cleanly. Beside one previous nitpick that still applies, LGTM;
| // event loop until the page/module is torn down). | ||
| #[cfg(target_arch = "wasm32")] | ||
| let _ = info_fetcher_handle; | ||
| drop(info_fetcher_handle); |
There was a problem hiding this comment.
Claude can't refrain from rewriting unrelated stuff with bad taste, can't he 😛 (I'm mostly joking, I don't care what syntax people use to drop value, although I do personally prefer the former)
| "Wrote traces to log exporter" | ||
| ); | ||
| return Ok(AgentResponse::Unchanged); | ||
| } |
There was a problem hiding this comment.
nitpick: since this is a totally different codepath that returns early, might be clearer to move that logic into its own synchronous helper (rustc is pretty good at inlining local functions, so this should be free).
Up this nitpick (although not blocking, I'll let you decide);
Add a "log exporter" transport that writes traces as newline-delimited
JSON to stdout in the format consumed by the Datadog Forwarder, instead
of sending them to an agent over HTTP. This is the serverless fallback
used in AWS Lambda when no Datadog agent or Lambda extension is
reachable, matching the log writers in dd-trace-js/py/go/java.
- json_log_encoder (libdd-trace-utils): encodes spans to
{"traces":[[...]]} lines with lowercase hex ids (incl. 128-bit),
integer error, ns start/duration, empty-field omission, and greedy
256 KiB line batching that drops oversize single spans.
- TraceExporterBuilder::set_output_to_log selects a stdout destination
that bypasses agent-info polling, client-side stats, V1 negotiation,
and telemetry. Exposed over FFI as
ddog_trace_exporter_config_set_output_to_log.
- The write goes through a new LogWriterCapability so the transport also
works on wasm (where the host, e.g. JavaScript, supplies the sink);
the native capability writes to stdout.
Writes are synchronous and intended for single-threaded serverless
runtimes. meta_struct is omitted (raw msgpack the log intake can't read).
BREAKING CHANGE: TraceExporter<C> (and TraceExporterBuilder::build /
build_async and trace_buffer::DefaultExport<C>) now require
C: LogWriterCapability. Callers using a custom capability bundle must
implement libdd_capabilities::LogWriterCapability; NativeCapabilities
already implements it, so callers using NativeCapabilities are unaffected.
7a39e74 to
c71fb77
Compare
What does this PR do?
Adds a stdout "log exporter" transport to libdatadog's trace pipeline. When enabled,
TraceExporterwrites traces as newline-delimited JSON ({"traces":[[...]]}) to stdout in the format consumed by the Datadog Forwarder, instead of sending them to an agent over HTTP.New pieces:
libdd-trace-utils::json_log_encoder— Forwarder JSON encoder: lowercase hex IDs (incl. 128-bit), integererror, nsstart/duration, empty-field omission, greedy 256 KiB line batching, oversize-span drop.TraceExporterBuilder::set_output_to_log(max_line_size)— selects a stdout destination that bypasses agent-info polling, client-side stats, V1 negotiation, and telemetry. Exposed over FFI asddog_trace_exporter_config_set_output_to_log.libdd_capabilities::LogWriterCapability— the write goes through a capability so the transport also works on wasm (where the host, e.g. JavaScript, supplies the sink). The native capability writes to stdout.Motivation
Serverless runtimes (primarily AWS Lambda) often have no reachable Datadog agent or Lambda extension. The established fallback is to write traces to stdout, where the Datadog Forwarder tails the logs and submits them to the trace intake. Today each tracer (dd-trace-js/py/go/java) reimplements this independently; this lands a single reusable implementation in libdatadog that native-backed tracers can adopt.
Additional Notes
TraceExporter<C>(andTraceExporterBuilder::build/build_async,trace_buffer::DefaultExport<C>) now requireC: LogWriterCapability. Callers using a custom capability bundle must implementlibdd_capabilities::LogWriterCapability;NativeCapabilitiesalready implements it, so callers usingNativeCapabilitiesare unaffected. PR title carries the!marker accordingly.recommended_log_output()was removed per review; reading env from libdatadog has caused production crashes). The SDK detects Lambda/agent/extension itself and callsset_output_to_log.meta_structis intentionally omitted (raw msgpack the log intake can't interpret), matching the reference exporters.serde_jsonalready in tree) → noCargo.lock/LICENSE-3rdparty.csvchurn.How to test the change?
Key tests:
json_log_encoder— golden bytes, Forwarderis_tracecontract, hex (incl. 128-bit), size-cap batching, oversize-drop, non-finite-metric → null, multi-trace flatten, span_links/span_events present-case.trace_exporter::tests::test_log_mode_send_writes_forwarder_json— fullsend(msgpack)→ decode → log branch → capability bytes (via a capturing test capability).trace_exporter::tests::test_log_mode_makes_no_agent_requests— zero agent calls +info_fetchernot spawned in log mode.log_writerhelper tests; FFIconfig_output_to_log_test.