Skip to content

feat: Implement TracingChannel Proposal#3195

Open
logaretm wants to merge 7 commits intoredis:masterfrom
logaretm:feat/tracing-channel
Open

feat: Implement TracingChannel Proposal#3195
logaretm wants to merge 7 commits intoredis:masterfrom
logaretm:feat/tracing-channel

Conversation

@logaretm
Copy link

@logaretm logaretm commented Mar 11, 2026

This PR implements tracing channels for command and connect operations. This allows APM libraries (OTEL, Datadog, Sentry) can instrument redis commands without monkey-patching.

It also enables users can setup their own custom instrumentations and analytics.

Tracing Channels

All channels in this PR use the Node.js TracingChannel API, which provides start, end, asyncStart, asyncEnd, and error sub-channels automatically.

TracingChannel Tracks Context fields
node-redis:command Individual Redis command execution from queue to reply (standalone, MULTI, and pipeline) command, args, database, serverAddress, serverPort, and optionally batchMode, batchSize for MULTI/pipeline commands
node-redis:connect Client connection lifecycle (initial connect only, not reconnections) serverAddress, serverPort

Feel free to suggest different names but most conventions follow the lib name, or they append a namespace before it as well like redis:node-redis:connect, so LMK which one feels suitable.

Context Properties

I have closely followed what OTEL currently extracts as attributes in their instrumentations and made it available in the event message payload.

The payload sent in the trace events includes the following:

Field Source OTEL attribute it enables
command args[0] (uppercased) db.operation.name
args full RedisArgument[] db.query.text (APM serializes/sanitizes)
database this.#selectedDB db.namespace
serverAddress socket options host server.address
serverPort socket options port server.port

For batch commands we include these properties:

Field Source OTEL attribute it enables
batchMode 'MULTI' or 'PIPELINE' db.operation.name prefix
batchSize commands.length db.operation.batch.size

Backward Compatibility

Zero-cost when no subscribers are registered. Silently skipped on Node.js versions where TracingChannel is unavailable (e.g. Node 18). This is possible with process.getBuiltinModule.

const dc = ('getBuiltinModule' in process)
  ? process.getBuiltinModule('node:diagnostics_channel')
  : require('node:diagnostics_channel');

closes #2590


Note

Medium Risk
Touches core client command/connection paths by wrapping connect, sendCommand, MULTI, and pipeline execution in tracing helpers; while intended to be no-op without subscribers, it could still affect performance or edge-case behavior across Node versions.

Overview
Adds Node.js diagnostics_channel TracingChannel instrumentation for Redis client connect and command execution.

connect() and all command enqueue paths (standalone sendCommand, MULTI, and pipeline execution) are wrapped to emit node-redis:connect and node-redis:command tracing events with context (command, args, database, serverAddress, serverPort, plus batchMode/batchSize for batches), and the trace context types are exported from the package entrypoint.

Introduces lib/client/tracing.ts (feature-gated when TracingChannel is unavailable and optimized to skip work when there are no subscribers) and adds coverage in tracing.spec.ts for success, error, and batch scenarios.

Written by Cursor Bugbot for commit ce03e84. This will update automatically on new commits. Configure here.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 11, 2026 20:08
@jit-ci
Copy link

jit-ci bot commented Mar 11, 2026

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements Node.js diagnostics_channel TracingChannel hooks to expose trace events for Redis command execution and client connect operations, enabling APM instrumentation without monkey-patching.

Changes:

  • Added traceCommand/traceConnect helpers and trace context types for node-redis:command and node-redis:connect.
  • Wrapped connect(), sendCommand(), pipeline, and MULTI execution paths with tracing.
  • Added specs validating emitted tracing events and batch metadata (MULTI / pipeline).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
packages/client/lib/client/tracing.ts Introduces tracing channel setup and helper wrappers for promise-based spans.
packages/client/lib/client/index.ts Instruments connect + command execution paths, and adds trace context construction.
packages/client/lib/client/tracing.spec.ts Adds tests that subscribe to tracing sub-channels and assert emitted contexts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +177 to +182
const client = RedisClient.create({
socket: {
port: 6379,
host: '127.0.0.1'
}
});
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test also hard-codes 127.0.0.1:6379. In this repo, Redis connection details are typically provided by testUtils/GLOBAL.SERVERS.*; using a fixed port/host is likely to break when tests run against the dockerized server. Consider structuring this as a testUtils.testWithClient with disableClientSetup: true, so you can create the client with the test server’s socket options and still assert connect() tracing events.

Copilot uses AI. Check for mistakes.
Comment on lines +963 to +969
const socketOptions = this._self.#options.socket;
await traceConnect(
() => this._self.#socket.connect(),
{
serverAddress: (socketOptions as any)?.host ?? 'localhost',
serverPort: (socketOptions as any)?.port ?? 6379
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serverAddress/serverPort are derived from socketOptions.host/socketOptions.port with any casts and defaults. RedisSocketOptions also supports IPC sockets (e.g. path) where host/port are not meaningful; emitting localhost:6379 in those cases would be incorrect. Consider narrowing the socket options type (e.g. handle 'path' in socketOptions separately) and making the emitted port optional/nullable when connecting via IPC.

Copilot uses AI. Check for mistakes.
Comment on lines +1115 to +1117
return {
command: String(args[0]).toUpperCase(),
args,
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#commandTraceContext assigns args by reference. Some call paths reuse and mutate the same args array (e.g. script fallback that rewrites args[0] from EVALSHA to EVAL), which can make the tracing payload reflect the mutated args instead of what was actually sent for the traced command. To keep trace metadata accurate, consider snapshotting the args (shallow copy) for the tracing context or ensuring callers don’t mutate arrays after passing them into sendCommand.

Suggested change
return {
command: String(args[0]).toUpperCase(),
args,
const argsSnapshot = Array.from(args);
return {
command: String(argsSnapshot[0]).toUpperCase(),
args: argsSnapshot,

Copilot uses AI. Check for mistakes.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might not be the case here, but it's actually intentional that diagnostics_channel be able to modify the given data in some cases. The main use case is adding distributed tracing headers to requests, but may be desired for other things too. Unless you have a compelling reason to do so, you can generally pass mutable data directly into it.

Copy link
Author

@logaretm logaretm Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have a strong opinion here, most of the tracing channels instrumentations I worked on didn't require mutable contexts but could be good to have.

EDIT: Misunderstood the review 🤦‍♂️

this._self.#scheduleWrite();
return promise;
},
this._self.#commandTraceContext(args)
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this._self.#commandTraceContext(args) is evaluated eagerly, so the context object (and the String(args[0]).toUpperCase() work) is created for every command even when there are no tracing subscribers. This undercuts the “zero-cost when no subscribers are registered” goal. Consider making traceCommand accept a lazy context factory, and only building the context when tracing is actually enabled (e.g. when the tracing channel’s sub-channels have subscribers).

Suggested change
this._self.#commandTraceContext(args)
() => this._self.#commandTraceContext(args)

Copilot uses AI. Check for mistakes.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you may want to pass a context builder in so you can use hasSubscribers to only build the context when it would be published.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, addressed it. Now the trace helpers accept a factory for context building.

@tlhunter
Copy link
Contributor

FYI @Qard

logaretm and others added 3 commits March 11, 2026 17:40
Context objects are now only built when there are active subscribers,
ensuring true zero-cost when no APM is listening.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
IPC sockets emit the path as serverAddress with undefined serverPort,
instead of misleadingly reporting localhost:6379.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hardcoded 127.0.0.1:6379 with testUtils.testWithClient and
disableClientSetup so tests work in CI with Docker-assigned ports.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@Qard Qard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Copilot already caught most of what I would have had to say about it. Might want to modify the traceConnect and traceCommand functions to take builder functions to the context objects and then use if (channel.hasSubscribers) { ... } to skip the context build and publish when there are no subscribers.

Comment on lines +1321 to +1325
{
...this._self.#commandTraceContext(args),
batchMode: 'PIPELINE',
batchSize
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want a context builder function for this too.

Comment on lines +1376 to +1380
{
...this._self.#commandTraceContext(args),
batchMode: 'MULTI',
batchSize
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context builder here also.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I addressed that in latest commits.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, looks like you were addressing it while I was reviewing the prior code at the same time. 😅

I've reviewed again. All looks good to me now. 🙂

@logaretm
Copy link
Author

@tlhunter @Qard Thank you folks for taking a look, do you think this should be ported over to ioredis as well?

@Qard
Copy link

Qard commented Mar 11, 2026

Feel free to port it there too. It's valuable to have diagnostics_channel built in to all the libraries. Thanks for working on this! ❤️

Exports CommandTraceContext, BatchCommandTraceContext, and
ConnectTraceContext from @redis/client so APM libraries can
type their channel subscribers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

assert.equal(errors[0].serverPort, 1);
} finally {
dc.unsubscribe('tracing:node-redis:connect:error', onError);
await badClient.destroy();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test destroys already-closed client in finally block

Low Severity

In the connection error test, await badClient.destroy() in the finally block will throw ClientClosedError because the client is already closed after connect() fails with reconnectStrategy: false. The socket
's #isOpen is set to false by #shouldReconnect when the strategy returns false, so calling destroy() on an already-closed client throws. The other connect test correctly guards with if (client.isOpen) before destroying, but this test does not.

Fix in Cursor Fix in Web

? (process as any).getBuiltinModule('node:diagnostics_channel')
: require('node:diagnostics_channel');

const hasTracingChannel = typeof dc.tracingChannel === 'function';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use the dc-polyfill package from npm. We built it to backport modern diagnostics channel functionality to older versions of Node.js. That may alleviate the checks you have for the presence of tracing channel. But of course that adds a new dependency that node-redis might not want.

https://www.npmjs.com/package/dc-polyfill

Copy link
Author

@logaretm logaretm Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tlhunter zero dependency is what makes this attractive for 3rd party authors I think.

Not sure if it works but we could potentially override getBuiltinModule and intercept it, kinda like IITM except it wouldn't need module hooks and can live on the APM's SDK side:

something like this:

process.getBuiltinModule = function(id) {
  // Check for both the prefixed and unprefixed module ID
  if (id === 'node:diagnostics_channel' || id === 'diagnostics_channel') {
    return dcPolyfill;
  }

  // Fall back to original behavior for all other modules
    return originalGetBuiltin.call(process, id);
}

Would that work? it kinda imposes them to use getBuiltInModule to load in tracing channels but I found that to be common across several libraries like this one due to compatibility concerns with both ESM and older node releases, pg is one example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you currently have is better

Gracefully handle environments where diagnostics_channel is
unavailable instead of throwing at module load.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On older Node.js versions that support TracingChannel but lack the
hasSubscribers property, the previous truthiness check would silently
skip tracing. Now we check explicitly for `false` so that tracing
is performed unconditionally when hasSubscribers is unavailable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@PavelPashov
Copy link
Contributor

Hi @logaretm & all, thank you for the PR.

Have you had a chance to look at the OTel metrics PR? It should be merged into master very soon. We are currently considering adding native OpenTelemetry tracing support as well, ideally building on top of the same OTel integration.

From a maintenance perspective, maintaining separate implementations for metrics and tracing would increase the complexity of the driver. For that reason, our current preference would be to extend the OTEL integration so that tracing builds on top of the upcoming metrics work, rather than introducing a separate diagnostics_channel–based approach.

If native OTel tracing support were available, would that address your use case? As far as we know, most APM platforms already support OTLP.

@Qard
Copy link

Qard commented Mar 23, 2026

Hi there, creator of diagnostics_channel (and subsequently TracingChannel) here. 👋🏻

It was actually created specifically for this purpose. Many libraries, and Node.js core itself, did not want to pull in a large dependency on OpenTelemetry or to be tied to its API model. I created diagnostics_channel specifically to enable decoupling the sharing of Observability data from specific consumer APIs so libraries could easily add tracing without additional dependencies.

While much of the market has adopted OpenTelemetry just on hype and the resulting market pressure produced by that hype, diagnostics_channel has also been adopted by most Node.js vendors, and without any hype pushing that along artificially. It's built to be extremely low-cost, being zero cost unless there are actually active subscribers, while the OpenTelemetry API often has actually quite a bit of overhead. 🙈

A lot of performance-focused libraries like fastify and undici have gone for diagnostics_channel-based instrumentations instead and building OpenTelemetry plugins on top of that as it performs a lot better and is a lot less invasive of a change.

@logaretm
Copy link
Author

logaretm commented Mar 23, 2026

Hey @PavelPashov, thanks for the response. I totally understand the concern about maintaining two separate systems. Building on what @Qard said above, I wanted to show how TracingChannel doesn't compete with OTel, it actually reduces the OTel maintenance burden on your team.

The command duration measurement in #3110 already follows a start/end pattern that TracingChannel handles natively:

// Current approach in #3110 — manual start/end wired into every callsite:
const recordOperation = OTelMetrics.instance.commandMetrics
    .createRecordOperationDuration(args, clientId);
// ...execute...
recordOperation(err);

// TracingChannel — single instrumentation point:
tracePromise(commandChannel, { command, args, serverAddress, ... }, promise);
// start/asyncEnd/error lifecycle managed by Node.js

If node-redis emits lifecycle events via TracingChannel:

  1. OTel metrics can subscribe to those events instead of being wired into every callsite, fewer touchpoints in core, less coupling.
  2. OTel tracing (spans) comes almost for free, just subscribe to the same channels and create spans. No additional surgery to core code.
  3. Non-OTel consumers get access too. Sentry recently shipped a light mode that drops OTel entirely and instruments via diagnostics_channel directly. These vendors currently have to monkey-patch node-redis, TracingChannel gives them a clean subscription target without any work from your team.

This is the same architecture node:http uses today, OTel's own @opentelemetry/instrumentation-http subscribes to diagnostics_channel events rather than monkey-patching. As @Qard mentioned, fastify and undici have taken the same approach.

Worth noting: OTel JS SDK v3 is planning to raise the minimum Node.js version to 22. Users who want to actually use node-redis's OTel metrics or tracing need SDK packages (sdk-metrics, sdk-trace-base), not just the API, so they'd inherit that Node version constraint. This is already forcing us at Sentry to drop Node 18 and 20 support in our next major, against our usual support philosophy. TracingChannel, as a Node.js built-in, has no such constraint.

I'm happy to rework this PR to sit underneath the metrics layer and power it, and show how it can be used to set up the tracing integration for the future. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Proposal: Tracing Channel / Diagnostics Channel support

5 participants