Skip to content

Add FFE evaluation metrics#3911

Open
leoromanovsky wants to merge 34 commits into
masterfrom
leo.romanovsky/m3-ffe-evaluation-metrics
Open

Add FFE evaluation metrics#3911
leoromanovsky wants to merge 34 commits into
masterfrom
leo.romanovsky/m3-ffe-evaluation-metrics

Conversation

@leoromanovsky
Copy link
Copy Markdown

@leoromanovsky leoromanovsky commented May 22, 2026

Motivation

PHP FFE evaluation metrics need to follow the native runtime architecture from the shared design doc: https://docs.google.com/document/d/1NvMfTpZWLBlFmEFNjdnlMyeVpy5l7KD8qujGFco6w2w/edit?tab=t.0

#3906 gives PHP a real Remote Config-backed evaluator. This PR adds metric delivery without PHP OTLP writer, encoder, or transport code. PHP records one typed final evaluation result and native/libdatadog sidecar code owns buffering, aggregation, encoding, and OTLP submission.

Changes

This PR is stacked on #3906 and now uses current libdatadog main (a84923e5ec), which includes the merged metric runtime from DataDog/libdatadog#2052. Native DDTrace\ffe_evaluate records evaluation metrics by default; the PHP 8 OpenFeature path disables automatic native recording for the raw evaluation call and then records the final OpenFeature-aware outcome once through the typed EvaluationMetricRecorder adapter.

The result FFI shape was simplified per review feedback: Rust exposes nullable raw zend_string* pointers for FfeResult strings. C can move those strings directly into PHP result properties and metric recording without the extra result accessor/cleanup machinery.

Codex review follow-up: when no explicit OTEL_EXPORTER_OTLP_METRICS_ENDPOINT or OTEL_EXPORTER_OTLP_ENDPOINT is configured, the native FFE metrics resolver now derives the OTLP HTTP metrics endpoint from DD_TRACE_AGENT_URL / DD_AGENT_HOST, matching the PHP OpenTelemetry resolver before defaulting to localhost.

Decisions

Metric delivery remains separate from exposure logging. This PR intentionally expects OTLP evaluation metrics and no EVP exposure stream; #3910 owns exposure delivery and DataDog/libdatadog#2026 owns the exposure-only sidecar work.

The PHP layer is a typed adapter only. It does not aggregate metrics, encode OTLP, or submit HTTP payloads.

Validation

Focused PHP tests on this refreshed branch after merging latest master and bumping libdatadog to a84923e5ec:

cd /Users/leo.romanovsky/go/src/github.com/DataDog/dd-trace-php-ffe-metrics-restack
MAX_TEST_PARALLELISM=1 make test_c TESTS=tests/ext/ffe/evaluation_metrics_native.phpt
MAX_TEST_PARALLELISM=1 make test_c TESTS=tests/ext/ffe/native_bridge_evaluate.phpt

Result: both tests passed locally on PHP 8.5.6. The first command rebuilt the extension against the bumped libdatadog and Rust 1.87 toolchain.

Earlier ffe-dogfooding validation for this PR shape showed PHP OTLP metrics working end to end with no EVP exposure stream expected for this metrics PR:

OTLP metrics OK: php7=5 php8-openfeature=5 flag=ffe-dogfooding-string-flag
EVP exposures not observed: php7=0 php8-openfeature=0 flag=ffe-dogfooding-string-flag subject=php-metric-validator-1780007640

Local system-tests validation through DataDog/system-tests#7033:

./tooling/bin/build-debug-artifact gnu-aarch64-8.2-nts \
  /Users/leo.romanovsky/go/src/github.com/DataDog/system-tests-pr7033/binaries

DOCKER_CONFIG=/tmp/system-tests-docker-config-nocredsstore \
SYSTEM_TEST_BUILD_TIMEOUT=1800 \
./build.sh --library php --weblog-variant php-fpm-8.2

DOCKER_CONFIG=/tmp/system-tests-docker-config-nocredsstore \
TEST_LIBRARY=php \
WEBLOG_VARIANT=php-fpm-8.2 \
./run.sh +l php FEATURE_FLAGGING_AND_EXPERIMENTATION tests/ffe/test_flag_eval_metrics.py

Result: 17 passed in 81.28s.

Related PRs: #3906, DataDog/libdatadog#2052, DataDog/system-tests#7033.

@datadog-official
Copy link
Copy Markdown

datadog-official Bot commented May 22, 2026

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 8 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-php | min install tests   View in Datadog   GitLab

🔧 Fix in code (Fix with Cursor). 3 tests failed: get_loaded_remote_configs, get_agent_info, get_agent_sampling_config return incorrect content in shm_data_internal_fns.phpt.

DataDog/apm-reliability/dd-trace-php | test_extension_ci: [7.4]   View in Datadog   GitLab

🔧 Fix in code (Fix with Cursor). Panic during test execution in crashtracker_collect_all_threads.phpt: Job terminated with exit code 2.

DataDog/apm-reliability/dd-trace-php | test_extension_ci: [8.0]   View in Datadog   GitLab

🔧 Fix in code (Fix with Cursor). Test failure in crashtracker_collect_all_threads.phpt: Crashtracker collects all threads when collect_all_threads is enabled.

View all 8 failed jobs.

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 54.12% (-0.04%)

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 5c5729c | Docs | Datadog PR Page | Give us feedback!

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented May 22, 2026

Benchmarks [ tracer ]

Benchmark execution time: 2026-06-02 07:12:38

Comparing candidate commit 5c5729c in PR branch leo.romanovsky/m3-ffe-evaluation-metrics with baseline commit cd4e9ba in branch master.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 193 metrics, 0 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

scenario:MessagePackSerializationBench/benchMessagePackSerialization-opcache

  • 🟥 execution_time [+2.807µs; +5.673µs] or [+2.665%; +5.386%]

The canonical fixture PHPT explicitly enumerates the FFE classes it
requires before instantiating the Datadog FeatureFlags client. The
shared evaluation-completed envelope/hook added on this branch made
Client::createWithDependencies() reference NoopEvaluationCompletedHook,
EvaluationCompletedHook, and EvaluationCompleted, but the test helper
had not been updated, so the packaged/extension PHPT failed with
"Class DDTrace\FeatureFlags\Internal\NoopEvaluationCompletedHook not
found" before any fixture case ran.

Add the three new files to require_feature_flag_api so the PHPT
matches the runtime class graph used by Client::evaluate().
@leoromanovsky leoromanovsky marked this pull request as ready for review May 23, 2026 16:01
@leoromanovsky leoromanovsky requested review from a team as code owners May 23, 2026 16:01
@leoromanovsky leoromanovsky requested review from dd-oleksii and typotter and removed request for a team May 23, 2026 16:01
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f021234c39

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/api/FeatureFlags/Internal/Metric/OtlpHttpMetricTransport.php Outdated
@leoromanovsky leoromanovsky marked this pull request as draft May 23, 2026 16:10
leoromanovsky added a commit to DataDog/libdatadog that referenced this pull request May 23, 2026
Adds a parallel pathway for PHP feature-flag evaluation metrics
mirroring the FfeExposures forwarder. dd-trace-php encodes
`feature_flag.evaluations` counters as OTLP/protobuf in PHP
(via its existing PHP 7-safe `OtlpMetricEncoder`) and ships the
encoded bytes to the sidecar, which POSTs them to the user-configured
OTLP HTTP metrics intake.

Why a sibling action instead of reusing FfeExposures:

- The OTLP collector is not the Datadog Agent. It's user-configurable
  via OTEL_EXPORTER_OTLP_METRICS_ENDPOINT (default
  http://localhost:4318/v1/metrics), so the endpoint travels with the
  payload rather than being derived from the sidecar session's agent
  base URL.
- Content type differs (application/x-protobuf vs application/json).
- No EVP subdomain header.
- The payload is binary protobuf, not a JSON string.

dd-trace-php side (PR DataDog/dd-trace-php#3911) will refactor its
existing `OtlpHttpMetricTransport` (which currently does PHP-side
HTTP I/O, violating the architectural rule "no I/O outside the
sidecar") to call this new FFI.

Validation:

- `cargo test -p datadog-sidecar ffe` passes 7 tests
  (3 exposures + 4 metrics).
- `cargo check -p datadog-sidecar-ffi` clean.
Adds Mermaid sources and rendered PNGs for the hook (this) PR plus a
README documenting the regeneration workflow.

- `docs/php-ffe-stack/stack-pr3909.mmd` + `.png` — 4-PR stack with this
  PR highlighted (M1 done; EVP and metrics as siblings to come).
- `docs/php-ffe-stack/system-pr3909.mmd` + `.png` — target system
  architecture; this PR contributes the EvaluationCompletedHook +
  OpenFeature provider hook surface. All downstream nodes (writers,
  sidecar FFI, sidecar process, backends) marked future.
- `docs/php-ffe-stack/README.md` — npx invocation for regenerating
  PNGs locally; PR-by-PR diagram table; architectural rule note.

The architectural rule encoded in the system diagram (all I/O via the
libdatadog sidecar) is the same rule Bob applied to PR #3910. See
DataDog/libdatadog#2026 for the sidecar-side support.
leoromanovsky added a commit that referenced this pull request May 23, 2026
Per Bob's PR review (2026-05-22), the tracer extension must perform no
I/O outside the sidecar. Replaces the raw-socket `AgentExposureTransport`
with `SidecarExposureTransport`, which forwards exposure batches to the
libdatadog sidecar via a new native PHP function `\DDTrace\send_ffe_exposures`
that calls the `ddog_sidecar_send_ffe_exposures` FFI added in
DataDog/libdatadog#2026.

PHP side:

- Delete `Internal/Exposure/AgentExposureTransport.php` (raw socket
  POST to the Agent EVP proxy).
- Add `Internal/Exposure/SidecarExposureTransport.php` that JSON-encodes
  the batch and calls `\DDTrace\send_ffe_exposures()`. Fire-and-forget;
  the sidecar handles retries.
- Update `ExposureWriter::createDefault()` to instantiate the sidecar
  transport.
- Drop the obsolete `testAgentTransportBuildsAgentEvpRequest` PHPUnit
  test (HTTP construction now lives in libdatadog, covered by
  `cargo test -p datadog-sidecar ffe_flusher`).
- Add `Internal/DefaultEvaluationCompletedHook` and
  `Internal/CompositeEvaluationCompletedHook` so production callers go
  through a composite hook factory. In this PR the composite contains
  only `ExposureHook`; the metrics PR (#3911) contributes
  `EvaluationMetricHook` and the file conflict at merge resolves by
  combining both. Update `Client::create()` to call
  `DefaultEvaluationCompletedHook::create()`.

C/Rust bridge:

- Declare `ddog_ByteSlice` (and underlying `ddog_Slice_U8`) in
  `components-rs/common.h` for the metrics path; declare both
  `ddog_sidecar_send_ffe_exposures` and `ddog_sidecar_send_ffe_metrics`
  in `components-rs/sidecar.h`.
- Add C wrappers `ddtrace_sidecar_send_ffe_exposures(zend_string *)`
  and `ddtrace_sidecar_send_ffe_metrics(zend_string *endpoint,
  zend_string *payload_bytes)` in `ext/sidecar.{h,c}` that call the FFI
  with the current sidecar transport + instance id + queue id.
- Declare native PHP functions `\DDTrace\send_ffe_exposures(string): bool`
  and `\DDTrace\send_ffe_metrics(string, string): bool` in
  `ext/ddtrace.stub.php`; add corresponding arginfo entries and
  `ZEND_FUNCTION` registrations in `ext/ddtrace_arginfo.h`; implement
  `PHP_FUNCTION(DDTrace_send_ffe_exposures)` and
  `PHP_FUNCTION(DDTrace_send_ffe_metrics)` in `ext/ddtrace.c`.
- Bump `libdatadog` submodule to FFE branch tip `29762335c` (which
  provides both FFIs). The submodule will be bumped to the libdatadog
  main commit once #2026 merges.

Docs:

- Add `docs/php-ffe-stack/{stack,system}-pr3910.{mmd,png}` for this PR.

Validation:

- `php vendor/bin/phpunit --config phpunit.xml tests/api/Unit/FeatureFlags`
  → 41 tests, 174 assertions, OK.
- libdatadog sidecar tests (`cargo test -p datadog-sidecar ffe_flusher`)
  → 3 passed, on the pinned submodule commit.
- Mermaid PNGs regenerate via `npx @mermaid-js/mermaid-cli`.

`make test_featureflags` and `make test_c TESTS=tests/ext/ffe/...` will
run in CI; running them locally requires rebuilding the extension which
is gated behind libdatadog #2026 merging.
Adds the M3 evaluation-metrics layer on top of the hook PR (#3909) as a
sibling of the EVP exposures PR (#3910). Records `feature_flag.evaluations`
for both PHP 7 (DD Client hook) and PHP 8 (OpenFeature SDK hook); both
paths share `EvaluationMetricHook::sharedWriter()` for unified
aggregation. OTLP/protobuf payloads are encoded in PHP via the existing
`OtlpMetricEncoder` and delivered to the user-configured OTLP HTTP
metrics intake through the libdatadog sidecar (`ddog_sidecar_send_ffe_metrics`
FFI added in DataDog/libdatadog#2026).

This branch is force-pushed (user-authorized one-time exception to the
no-force-push rule, 2026-05-23) to restructure history away from being
linearly stacked on the M2 exposures PR (#3910). The PR now stacks
directly on the hook PR (#3909) as a sibling of the EVP PR.

PHP side:

- Add `Internal/Metric/EvaluationMetricWriter` with bounded series
  aggregation, drop accounting, and shutdown flush.
- Add `Internal/Metric/EvaluationMetricHook` (DD Client hook) and
  `OtlpMetricEncoder` (PHP 7-safe protobuf encoding).
- Add `Internal/Metric/SidecarOtlpMetricsTransport` that calls
  `\DDTrace\send_ffe_metrics()` (FFI declared in #3910). Endpoint
  resolution: `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT`, falling back to
  `OTEL_EXPORTER_OTLP_ENDPOINT + /v1/metrics`, default
  `http://localhost:4318/v1/metrics`.
- Add `DDTrace\OpenFeature\EvalMetricsHook` implementing
  `OpenFeature\interfaces\hooks\Hook` (after + error stages), registered
  on `DataDogProvider` via `setHooks()`.
- `DataDogProvider` constructs its internal DD `Client` with
  `DefaultEvaluationCompletedHook::createWithoutMetric()` so the
  OpenFeature path records the metric via the OpenFeature hook (PR 3911
  scope) and NOT via the DD Client hook — preventing double-counting.
  PHP 7 path keeps recording via the DD Client hook.
- Add `Internal/CompositeEvaluationCompletedHook` and
  `Internal/DefaultEvaluationCompletedHook` (metric-only composite).
  This is the merge-conflict point with PR #3910's `[ExposureHook]`
  composite — second merge resolves by combining both hooks.
- Update `Client::create()` to call `DefaultEvaluationCompletedHook::create()`.
- Drop the obsolete `testOtlpTransportBuildsHttpProtobufRequest` PHPUnit
  test (HTTP construction now lives in libdatadog, covered by
  `cargo test -p datadog-sidecar ffe_metrics_flusher`).
- Add `_files_openfeature.php` entry for `EvalMetricsHook.php`.

C/Rust bridge: the `\DDTrace\send_ffe_metrics()` native function, its C
wrapper `ddtrace_sidecar_send_ffe_metrics()`, and the
`ddog_sidecar_send_ffe_metrics` FFI declaration in `components-rs/sidecar.h`
were already added in #3910. This PR's branch picks up those changes
once #3910 merges (or via the same libdatadog submodule pin during
review). For development locally the libdatadog submodule is pinned to
the FFE branch tip (`29762335c`).

Docs:

- Add `docs/php-ffe-stack/{stack,system}-pr3911.{mmd,png}` per the
  4-PR documentation convention.

Validation:

- `php vendor/bin/phpunit --config phpunit.xml tests/api/Unit/FeatureFlags`
  → 40 tests, 160 assertions, OK.
- Mermaid PNGs regenerate via `npx @mermaid-js/mermaid-cli`.

`make test_featureflags`, OpenFeature PHPUnit, and ffe-dogfooding
end-to-end validation will run in CI / are validated separately by
FOLLOW-05 Steps 4–5.
@leoromanovsky leoromanovsky force-pushed the leo.romanovsky/m3-ffe-evaluation-metrics branch from f021234 to e74b050 Compare May 23, 2026 17:44
@leoromanovsky leoromanovsky changed the base branch from leo.romanovsky/m2-ffe-exposures to leo.romanovsky/m2-m3-evaluation-completed-base May 23, 2026 17:44
The SidecarOtlpMetricsTransport::resolveEndpoint() default ("http://localhost:4318/v1/metrics")
doesn't match the system-tests parametric setup, where the PHP test
client receives DD_AGENT_HOST=<test_agent_container> but no
OTEL_EXPORTER_OTLP_METRICS_ENDPOINT. The previous OtlpHttpMetricTransport
(replaced by this transport) derived the OTLP endpoint from DD_AGENT_HOST
+ port 4318. Restore that fallback so system-tests
test_php_ffe_evaluation_metric finds the test-agent OTLP intake.

Resolution order (now matches the old transport):
1. OTEL_EXPORTER_OTLP_METRICS_ENDPOINT (explicit)
2. OTEL_EXPORTER_OTLP_ENDPOINT + /v1/metrics
3. DD_AGENT_HOST + :4318/v1/metrics
4. localhost:4318/v1/metrics
The C/Rust bridge for the new native PHP functions
\DDTrace\send_ffe_exposures() and \DDTrace\send_ffe_metrics() lives on
the M2 EVP exposures PR (#3910) because that PR introduced the bridge
when refactoring the exposure transport. PR #3911 (this PR) needs the
same bridge for its OTLP metrics transport — without it the
SidecarOtlpMetricsTransport silently drops batches because
function_exists('\\DDTrace\\send_ffe_metrics') is false.

Adds the same bridge files here so the M3 branch is independently
compilable. At merge time the two PRs will conflict at the file level on
these bridge files; resolution is deduplication (the bridge is identical
in both PRs by design).

Files added/modified:
- components-rs/sidecar.h: declares ddog_sidecar_send_ffe_exposures and
  ddog_sidecar_send_ffe_metrics FFIs.
- components-rs/common.h: declares ddog_ByteSlice typedef for the
  metrics payload.
- ext/sidecar.h, ext/sidecar.c: C wrappers
  ddtrace_sidecar_send_ffe_exposures() and ddtrace_sidecar_send_ffe_metrics().
- ext/ddtrace.stub.php, ext/ddtrace_arginfo.h, ext/ddtrace.c: declares
  the native PHP functions and the PHP_FUNCTION implementations.
Pulls in libdatadog commit `875ec8f0e` ("fix(sidecar): dispatch FFE
actions before application-entry check"). Without this fix, the
`SidecarOtlpMetricsTransport::send()` call from PHP would silently
no-op for short-lived processes: the sidecar received the
`FfeMetrics` action but dropped it because the `Entry::Occupied`
gate on the application metadata had not yet fired.

This unblocks the parametric system-test
`Test_Feature_Flag_Parametric_Evaluation_Metrics::test_php_ffe_evaluation_metric`
which exercises the full PHP -> sidecar -> OTLP-HTTP-intake path
end-to-end. Local result: 26/27 FFE-scoped parametric tests pass
(remaining failure is the EVP exposure test, which lives on the
M2 PR #3910 branch).
leoromanovsky added a commit that referenced this pull request May 23, 2026
Pulls in libdatadog commit `875ec8f0e` ("fix(sidecar): dispatch FFE
actions before application-entry check") so the EVP exposure batch sent
via `ddog_sidecar_send_ffe_exposures` is no longer silently dropped
when the PHP runtime hasn't yet registered the application against the
sidecar's `QueueId`. Same fix is on the sibling PR #3911 (`2a48c4987")
for the OTLP metric path.

Without this submodule bump, `Test_Feature_Flag_Parametric_Exposures::test_php_ffe_exposure_event`
sees zero EVP POSTs at the test-agent because the sidecar's
`enqueue_actions` handler discards the `FfeExposures` action under
the `Entry::Occupied` gate.
leoromanovsky added a commit that referenced this pull request May 23, 2026
…macOS+colima

`build-debug-artifact` produces a `/output` bind mount via `-v
${TMP_OUT}:/output`. On macOS+colima only paths under $HOME are
mounted into the Linux VM; the macOS default `/var/folders/...` temp
dir is not, so writes from inside the container to `/output` land in
the VM and never propagate back to the host. The script exits without
error and the user's binaries dir is left with whatever stale artifact
was there before.

Pin TMP_OUT/TMP_PKG under $OUTPUT_DIR (which the user just passed as
an absolute, usable destination) so the bind mount is always on a
host-visible path. Inherits cleanup via the existing EXIT trap.

The same fix is independently on the M3 branch (#3911) as part of
e74b050; bringing it to the M2 branch (#3910) so both branches are
buildable on macOS without a `TMPDIR=$HOME/...` override.
…gh rez

Three fixes to the per-PR FFE diagrams:

1. **Titles were truncated to "PHP FFE 4-PR stack — current =" with the
   PR number missing.** Mermaid uses the YAML frontmatter for the title
   block, and YAML treats unquoted `#` as the start of a comment, so
   `title: PHP FFE 4-PR stack — current = #3911 (OTLP metrics)` got
   parsed as just `PHP FFE 4-PR stack — current =`. Quoting the title
   keeps the `#PR-number` portion intact.

2. **System diagrams switched from `flowchart LR` to `flowchart TD`.**
   LR forced the PHP-process / host / backend lanes into a single
   very-wide row that rendered as an unreadable horizontal strip on PR
   pages. TD stacks them vertically and keeps the per-lane subgraphs
   readable.

3. **Re-rendered at 2400×2400 with `--scale 3`** (~1800×2000 stack,
   ~3000×4500 system) instead of the default ~600px width. PR-page
   thumbnails render legibly and zoomed-in detail stays sharp.

README's regeneration recipe updated with all three knobs and a
note on why the `title:` quoting matters.
leoromanovsky added a commit that referenced this pull request May 23, 2026
…gh rez

Same three fixes as the parallel commit on the M3 (PR #3911) branch:

1. Quote the YAML `title:` so the `#PR-number` is not parsed as a
   comment (previous render had the title truncated at "current =").
2. `flowchart LR` → `flowchart TD` on system diagrams so vertical
   PHP-process → host → backend lanes stack vertically instead of
   getting squeezed into one wide row.
3. Render with `-w 2400 -H 2400 --scale 3 -b white` (~1900×2000 stack,
   ~3000×4600 system) instead of ~600px default.
…gh rez

Same three fixes as on the M2 (#3910) and M3 (#3911) sibling branches:

1. Quote the YAML `title:` so the `#PR-number` survives parsing
   (otherwise YAML treats the `#` as a comment and the title renders
   as "PHP FFE 4-PR stack — current =" with the rest missing).
2. `flowchart LR` → `flowchart TD` on the system diagram so the
   PHP-process / host-sidecar / backend lanes stack vertically.
3. Render at 2400×2400 `--scale 3` instead of ~600px default.
leoromanovsky added a commit that referenced this pull request May 23, 2026
Brings the PHP FFE diagram convention to the M1 PR. Each subsequent PR
in the stack (#3909, #3910, #3911) already carried its own stack +
system diagram; #3906 was missing them.

Mirrors the format used by the rest of the stack:
- `stack-pr3906.mmd` — the 4-PR stack with #3906 badged as current
  and the downstream layers shown as "future".
- `system-pr3906.mmd` — the target end-to-end architecture with
  M1's scope (UserCode, OpenFeature Client, DataDogProvider,
  DDTrace FeatureFlags Client, NativeEvaluator, Remote Config client)
  highlighted, and everything from the Hook layer onward dashed.

All conventions match the other branches: quoted YAML titles (to keep
`#PR-number` out of the YAML comment parser), `flowchart TD`
orientation, rendered with `-w 2400 -H 2400 --scale 3 -b white`.
@leoromanovsky leoromanovsky requested review from a team as code owners May 27, 2026 23:00
@leoromanovsky leoromanovsky requested review from LobeTia and removed request for a team May 27, 2026 23:00
@leoromanovsky leoromanovsky changed the base branch from leo.romanovsky/m2-m3-evaluation-completed-base to leo.romanovsky/milestone-1-runtime-evaluation May 27, 2026 23:00
Base automatically changed from leo.romanovsky/milestone-1-runtime-evaluation to master May 28, 2026 15:04
@leoromanovsky
Copy link
Copy Markdown
Author

Known CI note: this PHP SDK PR currently has an expected CI failure from a Rust toolchain version mismatch between the newer libdatadog dependency and the dd-trace-php build environment. Please continue review on the code/design; this PR will not be merged until the trace team resolves the Rust version mismatch in the PHP build images/tooling.

@leoromanovsky
Copy link
Copy Markdown
Author

Local validation note for the Rust blocker: ffe-dogfooding was validated through its local PHP runtime mode, not the dd-trace-php CI image. The local PHP dogfooding Dockerfiles install Rust with rustup, and scripts/build-local-php.sh builds the mounted dd-trace-php checkout with RUSTC_BOOTSTRAP=1 make -j$DD_TRACE_PHP_BUILD_JOBS. That local path gets around the stale Rust toolchain currently present in PHP CI, so it proves runtime behavior but does not remove the CI merge blocker.

@leoromanovsky leoromanovsky marked this pull request as ready for review May 29, 2026 00:57
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dc19ce479b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ext/sidecar.c Outdated
return endpoint;
}

return zend_string_init(ZEND_STRL("http://localhost:4318/v1/metrics"), 0);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor the configured Datadog agent for FFE metrics

When neither OTEL_EXPORTER_OTLP_METRICS_ENDPOINT nor OTEL_EXPORTER_OTLP_ENDPOINT is set, this fallback ignores DD_TRACE_AGENT_URL and DD_AGENT_HOST, so containerized apps using the usual Datadog agent host (for example DD_AGENT_HOST=datadog-agent) will flush feature_flag.evaluations to the app container's localhost instead of the agent. The existing OpenTelemetry resolver derives metrics endpoints from those Datadog settings before defaulting to localhost (src/DDTrace/OpenTelemetry/CompositeResolver.php lines 169-198), so FFE metrics diverge from normal OTel metrics. Fresh evidence: the current native resolver now contains the same hardcoded fallback after the earlier PHP transport was replaced.

Useful? React with 👍 / 👎.

gh-worker-dd-mergequeue-cf854d Bot pushed a commit to DataDog/libdatadog that referenced this pull request Jun 1, 2026
## Motivation

PHP FFE evaluation metrics need a native path for aggregation, OTLP encoding, and delivery without building PHP OTLP writer/transport machinery. The shared design doc is the cross-PR reference: https://docs.google.com/document/d/1NvMfTpZWLBlFmEFNjdnlMyeVpy5l7KD8qujGFco6w2w/edit?tab=t.0

This PR is metric-only. Exposures remain in #2026 so reviewers can evaluate OTLP metric delivery independently from exposure cache semantics.

## Changes

This adds caller-driven FFE evaluation metric sidecar actions and OTLP export for `feature_flag.evaluations`.

The reusable FFE-domain pieces now live in `datadog-ffe` behind the `evaluation-metrics` feature: evaluation metric input types, metric attribute normalization, aggregation by matching attribute sets, and OTLP/protobuf payload encoding. `datadog-sidecar` keeps only sidecar-specific work: parsing the configured endpoint URL, building the HTTP request, applying the timeout, logging delivery failures, and integrating with sidecar lifecycle/actions.

The PHP companion PR uses this from native/C code for raw `DDTrace\ffe_evaluate` calls and from a thin PHP OpenFeature adapter for final OpenFeature-aware results. PHP does not aggregate, encode, or transport OTLP payloads.

Current PHP MVP path:

```mermaid
flowchart LR
    Eval["PHP evaluation<br/>raw API or OpenFeature adapter"]
    Record["PHP tracer native call<br/>record typed evaluation metric"]
    Action["sidecar action<br/>record FFE evaluation metrics"]
    Domain["datadog-ffe<br/>feature: evaluation-metrics<br/>attributes + aggregation + OTLP encoder"]
    Sidecar["shared sidecar<br/>metric flush lifecycle"]
    Collector["OTLP endpoint<br/>Agent or local collector"]
    Intake["feature_flag.evaluations"]

    Eval --> Record
    Record --> Action
    Action --> Domain
    Domain --> Sidecar
    Sidecar --> Collector
    Collector --> Intake
```

Future Python/Ruby connection:

```mermaid
flowchart LR
    PyToday["dd-trace-py today<br/>OpenFeature hook + host metric writer"]
    RbToday["dd-trace-rb today<br/>OpenFeature hook + host metric writer"]
    PyFuture["dd-trace-py future<br/>explicit native opt-in"]
    RbFuture["dd-trace-rb future<br/>explicit native opt-in"]
    Native["libdatadog caller-driven<br/>FFE metric action"]
    Shared["shared sidecar<br/>aggregation + OTLP delivery"]
    Otlp["OTLP endpoint"]

    PyToday -. "current host metric path" .-> Otlp
    RbToday -. "current host metric path" .-> Otlp
    PyFuture -. "after ownership switch" .-> Native
    RbFuture -. "after ownership switch" .-> Native
    Native --> Shared
    Shared --> Otlp
```

The future Python/Ruby arrows are intentionally not active behavior in this PR. They show the reusable target for a later migration while preserving today's host-language metric writers.

Why Python/Ruby do not double count today:

- Python and Ruby use libdatadog for evaluation only; the evaluator returns assignment metadata and does not record `feature_flag.evaluations` as a side effect.
- This PR adds a separate caller-driven sidecar action. Metric emission happens only when an SDK explicitly records a typed evaluation metric into that action. PHP wires this in its companion PR; Python and Ruby do not.
- Python and Ruby therefore keep exactly their current host-language OpenFeature metric writers. They are not also sending evaluation metrics through this native sidecar path.
- Evaluation metrics intentionally count every evaluation and do not have exposure-cache deduplication semantics. Future Python/Ruby migration must switch ownership to native logging and disable/bypass the host metric writer for the same evaluations.

Reference implementation check: dd-trace-java's canonical metric path is OpenFeature hook based. Java's `Provider` creates `FlagEvalMetrics` and returns a `FlagEvalHook`; the hook runs in `finallyAfter`, reads the final OpenFeature `FlagEvaluationDetails` including flag key, variant, reason, error code, and allocation metadata, and records one `feature_flag.evaluations` counter. Application code only calls OpenFeature; it does not call a metric API.

PHP mirrors that canonical OpenFeature shape. The PHP OpenFeature provider disables raw native metric recording while it asks the native evaluator for an assignment, then records exactly one final OpenFeature-aware metric through the Datadog-owned recorder. The raw Datadog PHP client has no direct Java equivalent, but it keeps the same SDK-owned ergonomics: normal evaluation APIs record one native metric per evaluation internally. For future Python/Ruby migration, the same rule applies: either keep the existing host-language OpenFeature metric hook, or switch ownership to the native recorder and disable/bypass the host metric writer for those evaluations.


## Decisions

No telemetry is emitted automatically from shared libdatadog evaluator calls. SDKs must explicitly enqueue FFE telemetry actions. This avoids double counting for Python/Ruby, which currently log feature-flag telemetry in host-language code.

Evaluation metrics intentionally count evaluations and do not use exposure-cache deduplication semantics.

Future Python/Ruby migration must be an ownership switch, not an additional writer. If those SDKs opt into this native metric path, their host-language OpenFeature metric writers must stop recording the same evaluations.

## Validation

Current head (`96d9a7bae`) local validation:

```sh
cd /Users/leo.romanovsky/go/src/github.com/DataDog/libdatadog-ffe-sidecar-metrics
cargo fmt --check
cargo test -p datadog-ffe --features evaluation-metrics telemetry::evaluation_metrics
cargo test -p datadog-sidecar ffe_metric
cargo check -p datadog-ffe
cargo check -p datadog-sidecar-ffi
```

Results: datadog-ffe metric tests passed (2 passed), sidecar metric tests passed (6 passed), default datadog-ffe check passed, sidecar FFI check passed, fmt check passed with only the repo stable-rustfmt warnings.

Prior downstream PHP behavior validation before the reusable-crate refactor, from DataDog/dd-trace-php#3911 using this PR at `1f1fca439`:

```text
ffe-dogfooding subject=php-3911-split-1779981881
php7_metrics=3 php8_metrics=3
php7_exposures=0 php8_exposures=0
```

System-tests downstream validation:

```sh
TEST_LIBRARY=php ./run.sh FEATURE_FLAGGING_AND_EXPERIMENTATION tests/ffe/test_flag_eval_metrics.py -vv
```

Result: 17 passed in 81.26 seconds.

Related PRs: DataDog/dd-trace-php#3906, DataDog/dd-trace-php#3911, #2026, DataDog/system-tests#7033.



Co-authored-by: leo.romanovsky <leo.romanovsky@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants