Add FFE evaluation metrics by leoromanovsky · Pull Request #3911 · DataDog/dd-trace-php

leoromanovsky · 2026-05-22T20:28:30Z

Motivation

PHP FFE evaluation metrics need to follow the native runtime architecture from the shared design doc: https://docs.google.com/document/d/1NvMfTpZWLBlFmEFNjdnlMyeVpy5l7KD8qujGFco6w2w/edit?tab=t.0

#3906 gives PHP a real Remote Config-backed evaluator. This PR adds metric delivery without PHP OTLP writer, encoder, or transport code. PHP records one typed final evaluation result and native/libdatadog sidecar code owns buffering, aggregation, encoding, and OTLP submission.

Changes

This PR is stacked on #3906 and now uses current libdatadog main (a84923e5ec), which includes the merged metric runtime from DataDog/libdatadog#2052. Native DDTrace\ffe_evaluate records evaluation metrics by default; the PHP 8 OpenFeature path disables automatic native recording for the raw evaluation call and then records the final OpenFeature-aware outcome once through the typed EvaluationMetricRecorder adapter.

The result FFI shape was simplified per review feedback: Rust exposes nullable raw zend_string* pointers for FfeResult strings. C can move those strings directly into PHP result properties and metric recording without the extra result accessor/cleanup machinery.

Codex review follow-up: when no explicit OTEL_EXPORTER_OTLP_METRICS_ENDPOINT or OTEL_EXPORTER_OTLP_ENDPOINT is configured, the native FFE metrics resolver now derives the OTLP HTTP metrics endpoint from DD_TRACE_AGENT_URL / DD_AGENT_HOST, matching the PHP OpenTelemetry resolver before defaulting to localhost.

Decisions

Metric delivery remains separate from exposure logging. This PR intentionally expects OTLP evaluation metrics and no EVP exposure stream; #3910 owns exposure delivery and DataDog/libdatadog#2026 owns the exposure-only sidecar work.

The PHP layer is a typed adapter only. It does not aggregate metrics, encode OTLP, or submit HTTP payloads.

Validation

Focused PHP tests on this refreshed branch after merging latest master and bumping libdatadog to a84923e5ec:

cd /Users/leo.romanovsky/go/src/github.com/DataDog/dd-trace-php-ffe-metrics-restack
MAX_TEST_PARALLELISM=1 make test_c TESTS=tests/ext/ffe/evaluation_metrics_native.phpt
MAX_TEST_PARALLELISM=1 make test_c TESTS=tests/ext/ffe/native_bridge_evaluate.phpt

Result: both tests passed locally on PHP 8.5.6. The first command rebuilt the extension against the bumped libdatadog and Rust 1.87 toolchain.

Earlier ffe-dogfooding validation for this PR shape showed PHP OTLP metrics working end to end with no EVP exposure stream expected for this metrics PR:

OTLP metrics OK: php7=5 php8-openfeature=5 flag=ffe-dogfooding-string-flag
EVP exposures not observed: php7=0 php8-openfeature=0 flag=ffe-dogfooding-string-flag subject=php-metric-validator-1780007640

Local system-tests validation through DataDog/system-tests#7033:

./tooling/bin/build-debug-artifact gnu-aarch64-8.2-nts \
  /Users/leo.romanovsky/go/src/github.com/DataDog/system-tests-pr7033/binaries

DOCKER_CONFIG=/tmp/system-tests-docker-config-nocredsstore \
SYSTEM_TEST_BUILD_TIMEOUT=1800 \
./build.sh --library php --weblog-variant php-fpm-8.2

DOCKER_CONFIG=/tmp/system-tests-docker-config-nocredsstore \
TEST_LIBRARY=php \
WEBLOG_VARIANT=php-fpm-8.2 \
./run.sh +l php FEATURE_FLAGGING_AND_EXPERIMENTATION tests/ffe/test_flag_eval_metrics.py

Result: 17 passed in 81.28s.

Related PRs: #3906, DataDog/libdatadog#2052, DataDog/system-tests#7033.

datadog-official · 2026-05-22T20:36:48Z

Tests

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 8 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-php | min install tests

🔧 Fix in code (Fix with Cursor).
3 tests failed: get_loaded_remote_configs, get_agent_info, get_agent_sampling_config return incorrect content in shm_data_internal_fns.phpt.

DataDog/apm-reliability/dd-trace-php | test_extension_ci: [7.4]

🔧 Fix in code (Fix with Cursor).
Panic during test execution in crashtracker_collect_all_threads.phpt: Job terminated with exit code 2.

DataDog/apm-reliability/dd-trace-php | test_extension_ci: [8.0]

🔧 Fix in code (Fix with Cursor).
Test failure in crashtracker_collect_all_threads.phpt: Crashtracker collects all threads when collect_all_threads is enabled.

View all 8 failed jobs.

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
• Patch Coverage: 100.00%
• Overall Coverage: 54.12% (-0.04%)

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 5c5729c | Docs | Datadog PR Page | Give us feedback!}

pr-commenter · 2026-05-22T21:47:53Z

Benchmarks [ tracer ]

Benchmark execution time: 2026-06-02 07:12:38

Comparing candidate commit 5c5729c in PR branch leo.romanovsky/m3-ffe-evaluation-metrics with baseline commit cd4e9ba in branch master.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 193 metrics, 0 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

🟩 = significantly better candidate vs. baseline
🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

scenario:MessagePackSerializationBench/benchMessagePackSerialization-opcache

🟥 execution_time [+2.807µs; +5.673µs] or [+2.665%; +5.386%]

The canonical fixture PHPT explicitly enumerates the FFE classes it requires before instantiating the Datadog FeatureFlags client. The shared evaluation-completed envelope/hook added on this branch made Client::createWithDependencies() reference NoopEvaluationCompletedHook, EvaluationCompletedHook, and EvaluationCompleted, but the test helper had not been updated, so the packaged/extension PHPT failed with "Class DDTrace\FeatureFlags\Internal\NoopEvaluationCompletedHook not found" before any fixture case ran. Add the three new files to require_feature_flag_api so the PHPT matches the runtime class graph used by Client::evaluate().

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f021234c39

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Adds a parallel pathway for PHP feature-flag evaluation metrics mirroring the FfeExposures forwarder. dd-trace-php encodes `feature_flag.evaluations` counters as OTLP/protobuf in PHP (via its existing PHP 7-safe `OtlpMetricEncoder`) and ships the encoded bytes to the sidecar, which POSTs them to the user-configured OTLP HTTP metrics intake. Why a sibling action instead of reusing FfeExposures: - The OTLP collector is not the Datadog Agent. It's user-configurable via OTEL_EXPORTER_OTLP_METRICS_ENDPOINT (default http://localhost:4318/v1/metrics), so the endpoint travels with the payload rather than being derived from the sidecar session's agent base URL. - Content type differs (application/x-protobuf vs application/json). - No EVP subdomain header. - The payload is binary protobuf, not a JSON string. dd-trace-php side (PR DataDog/dd-trace-php#3911) will refactor its existing `OtlpHttpMetricTransport` (which currently does PHP-side HTTP I/O, violating the architectural rule "no I/O outside the sidecar") to call this new FFI. Validation: - `cargo test -p datadog-sidecar ffe` passes 7 tests (3 exposures + 4 metrics). - `cargo check -p datadog-sidecar-ffi` clean.

Adds Mermaid sources and rendered PNGs for the hook (this) PR plus a README documenting the regeneration workflow. - `docs/php-ffe-stack/stack-pr3909.mmd` + `.png` — 4-PR stack with this PR highlighted (M1 done; EVP and metrics as siblings to come). - `docs/php-ffe-stack/system-pr3909.mmd` + `.png` — target system architecture; this PR contributes the EvaluationCompletedHook + OpenFeature provider hook surface. All downstream nodes (writers, sidecar FFI, sidecar process, backends) marked future. - `docs/php-ffe-stack/README.md` — npx invocation for regenerating PNGs locally; PR-by-PR diagram table; architectural rule note. The architectural rule encoded in the system diagram (all I/O via the libdatadog sidecar) is the same rule Bob applied to PR #3910. See DataDog/libdatadog#2026 for the sidecar-side support.

Per Bob's PR review (2026-05-22), the tracer extension must perform no I/O outside the sidecar. Replaces the raw-socket `AgentExposureTransport` with `SidecarExposureTransport`, which forwards exposure batches to the libdatadog sidecar via a new native PHP function `\DDTrace\send_ffe_exposures` that calls the `ddog_sidecar_send_ffe_exposures` FFI added in DataDog/libdatadog#2026. PHP side: - Delete `Internal/Exposure/AgentExposureTransport.php` (raw socket POST to the Agent EVP proxy). - Add `Internal/Exposure/SidecarExposureTransport.php` that JSON-encodes the batch and calls `\DDTrace\send_ffe_exposures()`. Fire-and-forget; the sidecar handles retries. - Update `ExposureWriter::createDefault()` to instantiate the sidecar transport. - Drop the obsolete `testAgentTransportBuildsAgentEvpRequest` PHPUnit test (HTTP construction now lives in libdatadog, covered by `cargo test -p datadog-sidecar ffe_flusher`). - Add `Internal/DefaultEvaluationCompletedHook` and `Internal/CompositeEvaluationCompletedHook` so production callers go through a composite hook factory. In this PR the composite contains only `ExposureHook`; the metrics PR (#3911) contributes `EvaluationMetricHook` and the file conflict at merge resolves by combining both. Update `Client::create()` to call `DefaultEvaluationCompletedHook::create()`. C/Rust bridge: - Declare `ddog_ByteSlice` (and underlying `ddog_Slice_U8`) in `components-rs/common.h` for the metrics path; declare both `ddog_sidecar_send_ffe_exposures` and `ddog_sidecar_send_ffe_metrics` in `components-rs/sidecar.h`. - Add C wrappers `ddtrace_sidecar_send_ffe_exposures(zend_string *)` and `ddtrace_sidecar_send_ffe_metrics(zend_string *endpoint, zend_string *payload_bytes)` in `ext/sidecar.{h,c}` that call the FFI with the current sidecar transport + instance id + queue id. - Declare native PHP functions `\DDTrace\send_ffe_exposures(string): bool` and `\DDTrace\send_ffe_metrics(string, string): bool` in `ext/ddtrace.stub.php`; add corresponding arginfo entries and `ZEND_FUNCTION` registrations in `ext/ddtrace_arginfo.h`; implement `PHP_FUNCTION(DDTrace_send_ffe_exposures)` and `PHP_FUNCTION(DDTrace_send_ffe_metrics)` in `ext/ddtrace.c`. - Bump `libdatadog` submodule to FFE branch tip `29762335c` (which provides both FFIs). The submodule will be bumped to the libdatadog main commit once #2026 merges. Docs: - Add `docs/php-ffe-stack/{stack,system}-pr3910.{mmd,png}` for this PR. Validation: - `php vendor/bin/phpunit --config phpunit.xml tests/api/Unit/FeatureFlags` → 41 tests, 174 assertions, OK. - libdatadog sidecar tests (`cargo test -p datadog-sidecar ffe_flusher`) → 3 passed, on the pinned submodule commit. - Mermaid PNGs regenerate via `npx @mermaid-js/mermaid-cli`. `make test_featureflags` and `make test_c TESTS=tests/ext/ffe/...` will run in CI; running them locally requires rebuilding the extension which is gated behind libdatadog #2026 merging.

Adds the M3 evaluation-metrics layer on top of the hook PR (#3909) as a sibling of the EVP exposures PR (#3910). Records `feature_flag.evaluations` for both PHP 7 (DD Client hook) and PHP 8 (OpenFeature SDK hook); both paths share `EvaluationMetricHook::sharedWriter()` for unified aggregation. OTLP/protobuf payloads are encoded in PHP via the existing `OtlpMetricEncoder` and delivered to the user-configured OTLP HTTP metrics intake through the libdatadog sidecar (`ddog_sidecar_send_ffe_metrics` FFI added in DataDog/libdatadog#2026). This branch is force-pushed (user-authorized one-time exception to the no-force-push rule, 2026-05-23) to restructure history away from being linearly stacked on the M2 exposures PR (#3910). The PR now stacks directly on the hook PR (#3909) as a sibling of the EVP PR. PHP side: - Add `Internal/Metric/EvaluationMetricWriter` with bounded series aggregation, drop accounting, and shutdown flush. - Add `Internal/Metric/EvaluationMetricHook` (DD Client hook) and `OtlpMetricEncoder` (PHP 7-safe protobuf encoding). - Add `Internal/Metric/SidecarOtlpMetricsTransport` that calls `\DDTrace\send_ffe_metrics()` (FFI declared in #3910). Endpoint resolution: `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT`, falling back to `OTEL_EXPORTER_OTLP_ENDPOINT + /v1/metrics`, default `http://localhost:4318/v1/metrics`. - Add `DDTrace\OpenFeature\EvalMetricsHook` implementing `OpenFeature\interfaces\hooks\Hook` (after + error stages), registered on `DataDogProvider` via `setHooks()`. - `DataDogProvider` constructs its internal DD `Client` with `DefaultEvaluationCompletedHook::createWithoutMetric()` so the OpenFeature path records the metric via the OpenFeature hook (PR 3911 scope) and NOT via the DD Client hook — preventing double-counting. PHP 7 path keeps recording via the DD Client hook. - Add `Internal/CompositeEvaluationCompletedHook` and `Internal/DefaultEvaluationCompletedHook` (metric-only composite). This is the merge-conflict point with PR #3910's `[ExposureHook]` composite — second merge resolves by combining both hooks. - Update `Client::create()` to call `DefaultEvaluationCompletedHook::create()`. - Drop the obsolete `testOtlpTransportBuildsHttpProtobufRequest` PHPUnit test (HTTP construction now lives in libdatadog, covered by `cargo test -p datadog-sidecar ffe_metrics_flusher`). - Add `_files_openfeature.php` entry for `EvalMetricsHook.php`. C/Rust bridge: the `\DDTrace\send_ffe_metrics()` native function, its C wrapper `ddtrace_sidecar_send_ffe_metrics()`, and the `ddog_sidecar_send_ffe_metrics` FFI declaration in `components-rs/sidecar.h` were already added in #3910. This PR's branch picks up those changes once #3910 merges (or via the same libdatadog submodule pin during review). For development locally the libdatadog submodule is pinned to the FFE branch tip (`29762335c`). Docs: - Add `docs/php-ffe-stack/{stack,system}-pr3911.{mmd,png}` per the 4-PR documentation convention. Validation: - `php vendor/bin/phpunit --config phpunit.xml tests/api/Unit/FeatureFlags` → 40 tests, 160 assertions, OK. - Mermaid PNGs regenerate via `npx @mermaid-js/mermaid-cli`. `make test_featureflags`, OpenFeature PHPUnit, and ffe-dogfooding end-to-end validation will run in CI / are validated separately by FOLLOW-05 Steps 4–5.

The SidecarOtlpMetricsTransport::resolveEndpoint() default ("http://localhost:4318/v1/metrics") doesn't match the system-tests parametric setup, where the PHP test client receives DD_AGENT_HOST=<test_agent_container> but no OTEL_EXPORTER_OTLP_METRICS_ENDPOINT. The previous OtlpHttpMetricTransport (replaced by this transport) derived the OTLP endpoint from DD_AGENT_HOST + port 4318. Restore that fallback so system-tests test_php_ffe_evaluation_metric finds the test-agent OTLP intake. Resolution order (now matches the old transport): 1. OTEL_EXPORTER_OTLP_METRICS_ENDPOINT (explicit) 2. OTEL_EXPORTER_OTLP_ENDPOINT + /v1/metrics 3. DD_AGENT_HOST + :4318/v1/metrics 4. localhost:4318/v1/metrics

The C/Rust bridge for the new native PHP functions \DDTrace\send_ffe_exposures() and \DDTrace\send_ffe_metrics() lives on the M2 EVP exposures PR (#3910) because that PR introduced the bridge when refactoring the exposure transport. PR #3911 (this PR) needs the same bridge for its OTLP metrics transport — without it the SidecarOtlpMetricsTransport silently drops batches because function_exists('\\DDTrace\\send_ffe_metrics') is false. Adds the same bridge files here so the M3 branch is independently compilable. At merge time the two PRs will conflict at the file level on these bridge files; resolution is deduplication (the bridge is identical in both PRs by design). Files added/modified: - components-rs/sidecar.h: declares ddog_sidecar_send_ffe_exposures and ddog_sidecar_send_ffe_metrics FFIs. - components-rs/common.h: declares ddog_ByteSlice typedef for the metrics payload. - ext/sidecar.h, ext/sidecar.c: C wrappers ddtrace_sidecar_send_ffe_exposures() and ddtrace_sidecar_send_ffe_metrics(). - ext/ddtrace.stub.php, ext/ddtrace_arginfo.h, ext/ddtrace.c: declares the native PHP functions and the PHP_FUNCTION implementations.

Pulls in libdatadog commit `875ec8f0e` ("fix(sidecar): dispatch FFE actions before application-entry check"). Without this fix, the `SidecarOtlpMetricsTransport::send()` call from PHP would silently no-op for short-lived processes: the sidecar received the `FfeMetrics` action but dropped it because the `Entry::Occupied` gate on the application metadata had not yet fired. This unblocks the parametric system-test `Test_Feature_Flag_Parametric_Evaluation_Metrics::test_php_ffe_evaluation_metric` which exercises the full PHP -> sidecar -> OTLP-HTTP-intake path end-to-end. Local result: 26/27 FFE-scoped parametric tests pass (remaining failure is the EVP exposure test, which lives on the M2 PR #3910 branch).

Pulls in libdatadog commit `875ec8f0e` ("fix(sidecar): dispatch FFE actions before application-entry check") so the EVP exposure batch sent via `ddog_sidecar_send_ffe_exposures` is no longer silently dropped when the PHP runtime hasn't yet registered the application against the sidecar's `QueueId`. Same fix is on the sibling PR #3911 (`2a48c4987") for the OTLP metric path. Without this submodule bump, `Test_Feature_Flag_Parametric_Exposures::test_php_ffe_exposure_event` sees zero EVP POSTs at the test-agent because the sidecar's `enqueue_actions` handler discards the `FfeExposures` action under the `Entry::Occupied` gate.

…macOS+colima `build-debug-artifact` produces a `/output` bind mount via `-v ${TMP_OUT}:/output`. On macOS+colima only paths under $HOME are mounted into the Linux VM; the macOS default `/var/folders/...` temp dir is not, so writes from inside the container to `/output` land in the VM and never propagate back to the host. The script exits without error and the user's binaries dir is left with whatever stale artifact was there before. Pin TMP_OUT/TMP_PKG under $OUTPUT_DIR (which the user just passed as an absolute, usable destination) so the bind mount is always on a host-visible path. Inherits cleanup via the existing EXIT trap. The same fix is independently on the M3 branch (#3911) as part of e74b050; bringing it to the M2 branch (#3910) so both branches are buildable on macOS without a `TMPDIR=$HOME/...` override.

…gh rez Three fixes to the per-PR FFE diagrams: 1. **Titles were truncated to "PHP FFE 4-PR stack — current =" with the PR number missing.** Mermaid uses the YAML frontmatter for the title block, and YAML treats unquoted `#` as the start of a comment, so `title: PHP FFE 4-PR stack — current = #3911 (OTLP metrics)` got parsed as just `PHP FFE 4-PR stack — current =`. Quoting the title keeps the `#PR-number` portion intact. 2. **System diagrams switched from `flowchart LR` to `flowchart TD`.** LR forced the PHP-process / host / backend lanes into a single very-wide row that rendered as an unreadable horizontal strip on PR pages. TD stacks them vertically and keeps the per-lane subgraphs readable. 3. **Re-rendered at 2400×2400 with `--scale 3`** (~1800×2000 stack, ~3000×4500 system) instead of the default ~600px width. PR-page thumbnails render legibly and zoomed-in detail stays sharp. README's regeneration recipe updated with all three knobs and a note on why the `title:` quoting matters.

…gh rez Same three fixes as the parallel commit on the M3 (PR #3911) branch: 1. Quote the YAML `title:` so the `#PR-number` is not parsed as a comment (previous render had the title truncated at "current ="). 2. `flowchart LR` → `flowchart TD` on system diagrams so vertical PHP-process → host → backend lanes stack vertically instead of getting squeezed into one wide row. 3. Render with `-w 2400 -H 2400 --scale 3 -b white` (~1900×2000 stack, ~3000×4600 system) instead of ~600px default.

…gh rez Same three fixes as on the M2 (#3910) and M3 (#3911) sibling branches: 1. Quote the YAML `title:` so the `#PR-number` survives parsing (otherwise YAML treats the `#` as a comment and the title renders as "PHP FFE 4-PR stack — current =" with the rest missing). 2. `flowchart LR` → `flowchart TD` on the system diagram so the PHP-process / host-sidecar / backend lanes stack vertically. 3. Render at 2400×2400 `--scale 3` instead of ~600px default.

Brings the PHP FFE diagram convention to the M1 PR. Each subsequent PR in the stack (#3909, #3910, #3911) already carried its own stack + system diagram; #3906 was missing them. Mirrors the format used by the rest of the stack: - `stack-pr3906.mmd` — the 4-PR stack with #3906 badged as current and the downstream layers shown as "future". - `system-pr3906.mmd` — the target end-to-end architecture with M1's scope (UserCode, OpenFeature Client, DataDogProvider, DDTrace FeatureFlags Client, NativeEvaluator, Remote Config client) highlighted, and everything from the Hook layer onward dashed. All conventions match the other branches: quoted YAML titles (to keep `#PR-number` out of the YAML comment parser), `flowchart TD` orientation, rendered with `-w 2400 -H 2400 --scale 3 -b white`.

leoromanovsky · 2026-05-28T23:08:03Z

Known CI note: this PHP SDK PR currently has an expected CI failure from a Rust toolchain version mismatch between the newer libdatadog dependency and the dd-trace-php build environment. Please continue review on the code/design; this PR will not be merged until the trace team resolves the Rust version mismatch in the PHP build images/tooling.

leoromanovsky · 2026-05-28T23:26:28Z

Local validation note for the Rust blocker: ffe-dogfooding was validated through its local PHP runtime mode, not the dd-trace-php CI image. The local PHP dogfooding Dockerfiles install Rust with rustup, and scripts/build-local-php.sh builds the mounted dd-trace-php checkout with RUSTC_BOOTSTRAP=1 make -j$DD_TRACE_PHP_BUILD_JOBS. That local path gets around the stale Rust toolchain currently present in PHP CI, so it proves runtime behavior but does not remove the CI merge blocker.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dc19ce479b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-29T01:01:47Z

+        return endpoint;
+    }
+
+    return zend_string_init(ZEND_STRL("http://localhost:4318/v1/metrics"), 0);


Honor the configured Datadog agent for FFE metrics

When neither OTEL_EXPORTER_OTLP_METRICS_ENDPOINT nor OTEL_EXPORTER_OTLP_ENDPOINT is set, this fallback ignores DD_TRACE_AGENT_URL and DD_AGENT_HOST, so containerized apps using the usual Datadog agent host (for example DD_AGENT_HOST=datadog-agent) will flush feature_flag.evaluations to the app container's localhost instead of the agent. The existing OpenTelemetry resolver derives metrics endpoints from those Datadog settings before defaulting to localhost (src/DDTrace/OpenTelemetry/CompositeResolver.php lines 169-198), so FFE metrics diverge from normal OTel metrics. Fresh evidence: the current native resolver now contains the same hardcoded fallback after the earlier PHP transport was replaced.

Useful? React with 👍 / 👎.

## Motivation PHP FFE evaluation metrics need a native path for aggregation, OTLP encoding, and delivery without building PHP OTLP writer/transport machinery. The shared design doc is the cross-PR reference: https://docs.google.com/document/d/1NvMfTpZWLBlFmEFNjdnlMyeVpy5l7KD8qujGFco6w2w/edit?tab=t.0 This PR is metric-only. Exposures remain in #2026 so reviewers can evaluate OTLP metric delivery independently from exposure cache semantics. ## Changes This adds caller-driven FFE evaluation metric sidecar actions and OTLP export for `feature_flag.evaluations`. The reusable FFE-domain pieces now live in `datadog-ffe` behind the `evaluation-metrics` feature: evaluation metric input types, metric attribute normalization, aggregation by matching attribute sets, and OTLP/protobuf payload encoding. `datadog-sidecar` keeps only sidecar-specific work: parsing the configured endpoint URL, building the HTTP request, applying the timeout, logging delivery failures, and integrating with sidecar lifecycle/actions. The PHP companion PR uses this from native/C code for raw `DDTrace\ffe_evaluate` calls and from a thin PHP OpenFeature adapter for final OpenFeature-aware results. PHP does not aggregate, encode, or transport OTLP payloads. Current PHP MVP path: ```mermaid flowchart LR Eval["PHP evaluation raw API or OpenFeature adapter"] Record["PHP tracer native call record typed evaluation metric"] Action["sidecar action record FFE evaluation metrics"] Domain["datadog-ffe feature: evaluation-metrics attributes + aggregation + OTLP encoder"] Sidecar["shared sidecar metric flush lifecycle"] Collector["OTLP endpoint Agent or local collector"] Intake["feature_flag.evaluations"] Eval --> Record Record --> Action Action --> Domain Domain --> Sidecar Sidecar --> Collector Collector --> Intake ``` Future Python/Ruby connection: ```mermaid flowchart LR PyToday["dd-trace-py today OpenFeature hook + host metric writer"] RbToday["dd-trace-rb today OpenFeature hook + host metric writer"] PyFuture["dd-trace-py future explicit native opt-in"] RbFuture["dd-trace-rb future explicit native opt-in"] Native["libdatadog caller-driven FFE metric action"] Shared["shared sidecar aggregation + OTLP delivery"] Otlp["OTLP endpoint"] PyToday -. "current host metric path" .-> Otlp RbToday -. "current host metric path" .-> Otlp PyFuture -. "after ownership switch" .-> Native RbFuture -. "after ownership switch" .-> Native Native --> Shared Shared --> Otlp ``` The future Python/Ruby arrows are intentionally not active behavior in this PR. They show the reusable target for a later migration while preserving today's host-language metric writers. Why Python/Ruby do not double count today: - Python and Ruby use libdatadog for evaluation only; the evaluator returns assignment metadata and does not record `feature_flag.evaluations` as a side effect. - This PR adds a separate caller-driven sidecar action. Metric emission happens only when an SDK explicitly records a typed evaluation metric into that action. PHP wires this in its companion PR; Python and Ruby do not. - Python and Ruby therefore keep exactly their current host-language OpenFeature metric writers. They are not also sending evaluation metrics through this native sidecar path. - Evaluation metrics intentionally count every evaluation and do not have exposure-cache deduplication semantics. Future Python/Ruby migration must switch ownership to native logging and disable/bypass the host metric writer for the same evaluations. Reference implementation check: dd-trace-java's canonical metric path is OpenFeature hook based. Java's `Provider` creates `FlagEvalMetrics` and returns a `FlagEvalHook`; the hook runs in `finallyAfter`, reads the final OpenFeature `FlagEvaluationDetails` including flag key, variant, reason, error code, and allocation metadata, and records one `feature_flag.evaluations` counter. Application code only calls OpenFeature; it does not call a metric API. PHP mirrors that canonical OpenFeature shape. The PHP OpenFeature provider disables raw native metric recording while it asks the native evaluator for an assignment, then records exactly one final OpenFeature-aware metric through the Datadog-owned recorder. The raw Datadog PHP client has no direct Java equivalent, but it keeps the same SDK-owned ergonomics: normal evaluation APIs record one native metric per evaluation internally. For future Python/Ruby migration, the same rule applies: either keep the existing host-language OpenFeature metric hook, or switch ownership to the native recorder and disable/bypass the host metric writer for those evaluations. ## Decisions No telemetry is emitted automatically from shared libdatadog evaluator calls. SDKs must explicitly enqueue FFE telemetry actions. This avoids double counting for Python/Ruby, which currently log feature-flag telemetry in host-language code. Evaluation metrics intentionally count evaluations and do not use exposure-cache deduplication semantics. Future Python/Ruby migration must be an ownership switch, not an additional writer. If those SDKs opt into this native metric path, their host-language OpenFeature metric writers must stop recording the same evaluations. ## Validation Current head (`96d9a7bae`) local validation: ```sh cd /Users/leo.romanovsky/go/src/github.com/DataDog/libdatadog-ffe-sidecar-metrics cargo fmt --check cargo test -p datadog-ffe --features evaluation-metrics telemetry::evaluation_metrics cargo test -p datadog-sidecar ffe_metric cargo check -p datadog-ffe cargo check -p datadog-sidecar-ffi ``` Results: datadog-ffe metric tests passed (2 passed), sidecar metric tests passed (6 passed), default datadog-ffe check passed, sidecar FFI check passed, fmt check passed with only the repo stable-rustfmt warnings. Prior downstream PHP behavior validation before the reusable-crate refactor, from DataDog/dd-trace-php#3911 using this PR at `1f1fca439`: ```text ffe-dogfooding subject=php-3911-split-1779981881 php7_metrics=3 php8_metrics=3 php7_exposures=0 php8_exposures=0 ``` System-tests downstream validation: ```sh TEST_LIBRARY=php ./run.sh FEATURE_FLAGGING_AND_EXPERIMENTATION tests/ffe/test_flag_eval_metrics.py -vv ``` Result: 17 passed in 81.26 seconds. Related PRs: DataDog/dd-trace-php#3906, DataDog/dd-trace-php#3911, #2026, DataDog/system-tests#7033. Co-authored-by: leo.romanovsky <leo.romanovsky@datadoghq.com>

…fe-evaluation-metrics # Conflicts: # libdatadog

Add FFE evaluation completion hook

16d7817

leoromanovsky mentioned this pull request May 23, 2026

Add FFE exposure emission #3910

Open

leoromanovsky marked this pull request as ready for review May 23, 2026 16:01

leoromanovsky requested review from a team as code owners May 23, 2026 16:01

leoromanovsky requested review from dd-oleksii and typotter and removed request for a team May 23, 2026 16:01

chatgpt-codex-connector Bot reviewed May 23, 2026

View reviewed changes

Comment thread src/api/FeatureFlags/Internal/Metric/OtlpHttpMetricTransport.php Outdated

leoromanovsky marked this pull request as draft May 23, 2026 16:10

leoromanovsky mentioned this pull request May 23, 2026

feat(sidecar): forward FFE exposures to EVP proxy DataDog/libdatadog#2026

Merged

leoromanovsky mentioned this pull request May 23, 2026

Add FFE evaluation completion hook #3909

Draft

leoromanovsky force-pushed the leo.romanovsky/m3-ffe-evaluation-metrics branch from f021234 to e74b050 Compare May 23, 2026 17:44

leoromanovsky changed the base branch from leo.romanovsky/m2-ffe-exposures to leo.romanovsky/m2-m3-evaluation-completed-base May 23, 2026 17:44

leoromanovsky added 3 commits May 23, 2026 13:53

Merge milestone 1 runtime evaluation base

23f52d0

leoromanovsky requested review from a team as code owners May 27, 2026 23:00

leoromanovsky requested review from LobeTia and removed request for a team May 27, 2026 23:00

leoromanovsky changed the base branch from leo.romanovsky/m2-m3-evaluation-completed-base to leo.romanovsky/milestone-1-runtime-evaluation May 27, 2026 23:00

leoromanovsky added 6 commits May 27, 2026 19:05

chore(ffe): update libdatadog sidecar dependency

a009394

Avoid FFE metric stress argument explosion

c323c7d

chore(ffe): update libdatadog sidecar dependency

0bf960e

chore(ffe): update libdatadog sidecar dependency

2109612

chore(ffe): update libdatadog sidecar dependency

64afbd6

chore(ffe): update libdatadog sidecar dependency

7fc555d

leoromanovsky mentioned this pull request May 28, 2026

Enable PHP FFE evaluation system tests DataDog/system-tests#7003

Merged

leoromanovsky marked this pull request as draft May 28, 2026 12:09

Fix FFE metric sidecar activation

bb85e36

This was referenced May 28, 2026

Enable PHP FFE evaluation metric system tests DataDog/system-tests#7033

Draft

feat(sidecar): forward FFE evaluation metrics to OTLP intake DataDog/libdatadog#2052

Merged

Base automatically changed from leo.romanovsky/milestone-1-runtime-evaluation to master May 28, 2026 15:04

leoromanovsky added 2 commits May 28, 2026 11:08

Update FFE metrics dependency and result ABI

287184e

Bump libdatadog metric runtime

dc19ce4

leoromanovsky mentioned this pull request May 28, 2026

Enable PHP FFE exposure system tests DataDog/system-tests#7031

Draft

leoromanovsky marked this pull request as ready for review May 29, 2026 00:57

leoromanovsky assigned sameerank and dd-oleksii May 29, 2026

chatgpt-codex-connector Bot reviewed May 29, 2026

View reviewed changes

leoromanovsky added 2 commits June 2, 2026 01:44

Merge remote-tracking branch 'origin/master' into leo.romanovsky/m3-f…

9d597eb

…fe-evaluation-metrics # Conflicts: # libdatadog

Honor Datadog agent config for FFE metrics

5c5729c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FFE evaluation metrics#3911

Add FFE evaluation metrics#3911
leoromanovsky wants to merge 34 commits into
masterfrom
leo.romanovsky/m3-ffe-evaluation-metrics

leoromanovsky commented May 22, 2026 •

edited

Loading

Uh oh!

datadog-official Bot commented May 22, 2026 •

edited by datadog-prod-us1-6 Bot

Loading

Uh oh!

pr-commenter Bot commented May 22, 2026 •

edited

Loading

Explanation

More details about the CI and significant changes

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

leoromanovsky commented May 28, 2026

Uh oh!

leoromanovsky commented May 28, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

leoromanovsky commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Decisions

Validation

Uh oh!

datadog-official Bot commented May 22, 2026 • edited by datadog-prod-us1-6 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

pr-commenter Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks [ tracer ]

Explanation

More details about the CI and significant changes

scenario:MessagePackSerializationBench/benchMessagePackSerialization-opcache

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

leoromanovsky commented May 28, 2026

Uh oh!

leoromanovsky commented May 28, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

leoromanovsky commented May 22, 2026 •

edited

Loading

datadog-official Bot commented May 22, 2026 •

edited by datadog-prod-us1-6 Bot

Loading

pr-commenter Bot commented May 22, 2026 •

edited

Loading