feat(ffe): add runtime-backed PHP feature flag evaluation by leoromanovsky · Pull Request #3906 · DataDog/dd-trace-php

leoromanovsky · 2026-05-22T08:29:50Z

Motivation

This PR adds real server-side PHP feature flag evaluation backed by Remote Config and the libdatadog native evaluator. PHP 7 gets a Datadog API, PHP 8 can use the optional OpenFeature bridge, and both paths use the same live runtime evaluator.

This is intentionally the evaluation layer only. Exposure delivery, exposure caching, and evaluation metrics land in the later hook, exposure, and metrics PRs.

Shared planning doc: https://docs.google.com/document/d/1NvMfTpZWLBlFmEFNjdnlMyeVpy5l7KD8qujGFco6w2w/edit?tab=t.0

Decisions

Production clients/providers read only from the live Remote Config-backed native runtime.
The root package remains PHP 7 installable; the OpenFeature SDK is used only in PHP 8 test/app environments and is not a root dependency.
productionRuntime=false remains a temporary guardrail while the hook, exposure delivery, and evaluation metrics layers are incomplete. The final production-ready PHP FFE stack should not ship with that warning enabled.
The PHP/OpenFeature bridge passes a targeting key plus flat primitive attributes (bool, int, float, string) into the native evaluator. Nested arrays, objects, and null attribute values are dropped at the boundary.
DDTrace\Testing\ffe_load_config exists only for local/canonical fixture tests.

Where this PR fits in the stack

This is the bottom layer of the 4-PR stack. #3909 adds the shared hook, while #3910 (EVP exposures) and #3911 (OTLP metrics) build on top of that hook.

pr3906-runtime-evaluation-stack-position

Where this PR fits in the target system

This PR contributes the in-PHP evaluation surface: UserCode -> OpenFeature Client (PHP 8) / DDTrace FeatureFlags Client (PHP 7 + 8), the DDTrace OpenFeature DataDogProvider, the NativeEvaluator FFI bridge into libdatadog, and the Remote Config client for the FFE_FLAGS product.

Changes

Imports the libdatadog FFE evaluator path and keeps PHP user-facing APIs as thin adapters over the native runtime.
Wires PHP 7-safe DDTrace\FeatureFlags evaluation through live Remote Config.
Adds a PHP 8-only DDTrace\OpenFeature provider bridge.
Adds the canonical DataDog/ffe-system-test-data submodule and a native-runtime PHPT loop over ufc-config.json and evaluation-cases/*.json.

Not Included

Exposure writer, exposure cache, or EVP forwarding.
Exposure flush/dedup tests.
Flag-evaluation metric emission.

Validation

php vendor/bin/phpunit --config phpunit.xml tests/api/Unit/FeatureFlags tests/OpenFeature/DataDogProviderTest.php: 29 tests / 87 assertions.
make test_featureflags: 8 tests / 29 assertions.
MAX_TEST_PARALLELISM=1 TESTS=tests/ext/ffe/native_bridge_evaluate.phpt make test_c: 1/1 passed.
MAX_TEST_PARALLELISM=1 TESTS=tests/ext/ffe/system_test_data_evaluate.phpt make test_c: 1/1 passed.
make test_internal_api_randomized: passed.

Dogfooding Validation

See DataDog/ffe-dogfooding#68.

Ran the PHP 7 and PHP 8 servers locally; observed correct evaluations.

datadog-datadog-prod-us1-2 · 2026-05-22T08:43:08Z

Tests

🎉 All green!

🧪 All tests passed

🎯 Code Coverage (details)
• Patch Coverage: 100.00%
• Overall Coverage: 60.70% (-0.03%)

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 782f622 | Docs | Datadog PR Page | Give us feedback!}

datadog-datadog-prod-us1-2 · 2026-05-22T08:43:53Z

Tests

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 1 Pipeline job failed

DataDog/apm-reliability/dd-trace-php | check-big-regressions

See error
Regression Check failed due to metrics exceeding thresholds: +39.15% and +27.42%.

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
• Patch Coverage: 100.00%
• Overall Coverage: 60.74% (+0.00%)

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 4cf261d | Docs | Datadog PR Page | Give us feedback!}

pr-commenter · 2026-05-22T12:50:41Z

Benchmarks [ tracer ]

Benchmark execution time: 2026-05-28 02:35:37

Comparing candidate commit 4cf261d in PR branch leo.romanovsky/milestone-1-runtime-evaluation with baseline commit 6b5659e in branch master.

Found 1 performance improvements and 4 performance regressions! Performance is the same for 188 metrics, 1 unstable metrics.

scenario:BM_TeaSapiSpindown

🟥 execution_time [+12.479µs; +30.485µs] or [+2.404%; +5.873%]

scenario:ContextPropagationBench/benchInject128Bit-opcache

🟥 execution_time [+676.492ns; +735.508ns] or [+39.841%; +43.316%]

scenario:ContextPropagationBench/benchInject64Bit-opcache

🟥 execution_time [+684.991ns; +767.009ns] or [+29.938%; +33.523%]

scenario:LogsInjectionBench/benchLogsInfoBaseline-opcache

🟥 execution_time [+99.822ns; +176.178ns] or [+5.937%; +10.478%]

scenario:LogsInjectionBench/benchLogsNullBaseline-opcache

🟩 execution_time [-1.271µs; -1.151µs] or [-5.806%; -5.254%]

…stone-1-runtime-evaluation

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e55ebb976b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

morrisonlevi

Looks good to me.

## Motivation PHP FFE evaluation metrics need a native path for aggregation, OTLP encoding, and delivery without building PHP OTLP writer/transport machinery. The shared design doc is the cross-PR reference: https://docs.google.com/document/d/1NvMfTpZWLBlFmEFNjdnlMyeVpy5l7KD8qujGFco6w2w/edit?tab=t.0 This PR is metric-only. Exposures remain in #2026 so reviewers can evaluate OTLP metric delivery independently from exposure cache semantics. ## Changes This adds caller-driven FFE evaluation metric sidecar actions and OTLP export for `feature_flag.evaluations`. The reusable FFE-domain pieces now live in `datadog-ffe` behind the `evaluation-metrics` feature: evaluation metric input types, metric attribute normalization, aggregation by matching attribute sets, and OTLP/protobuf payload encoding. `datadog-sidecar` keeps only sidecar-specific work: parsing the configured endpoint URL, building the HTTP request, applying the timeout, logging delivery failures, and integrating with sidecar lifecycle/actions. The PHP companion PR uses this from native/C code for raw `DDTrace\ffe_evaluate` calls and from a thin PHP OpenFeature adapter for final OpenFeature-aware results. PHP does not aggregate, encode, or transport OTLP payloads. Current PHP MVP path: ```mermaid flowchart LR Eval["PHP evaluation raw API or OpenFeature adapter"] Record["PHP tracer native call record typed evaluation metric"] Action["sidecar action record FFE evaluation metrics"] Domain["datadog-ffe feature: evaluation-metrics attributes + aggregation + OTLP encoder"] Sidecar["shared sidecar metric flush lifecycle"] Collector["OTLP endpoint Agent or local collector"] Intake["feature_flag.evaluations"] Eval --> Record Record --> Action Action --> Domain Domain --> Sidecar Sidecar --> Collector Collector --> Intake ``` Future Python/Ruby connection: ```mermaid flowchart LR PyToday["dd-trace-py today OpenFeature hook + host metric writer"] RbToday["dd-trace-rb today OpenFeature hook + host metric writer"] PyFuture["dd-trace-py future explicit native opt-in"] RbFuture["dd-trace-rb future explicit native opt-in"] Native["libdatadog caller-driven FFE metric action"] Shared["shared sidecar aggregation + OTLP delivery"] Otlp["OTLP endpoint"] PyToday -. "current host metric path" .-> Otlp RbToday -. "current host metric path" .-> Otlp PyFuture -. "after ownership switch" .-> Native RbFuture -. "after ownership switch" .-> Native Native --> Shared Shared --> Otlp ``` The future Python/Ruby arrows are intentionally not active behavior in this PR. They show the reusable target for a later migration while preserving today's host-language metric writers. Why Python/Ruby do not double count today: - Python and Ruby use libdatadog for evaluation only; the evaluator returns assignment metadata and does not record `feature_flag.evaluations` as a side effect. - This PR adds a separate caller-driven sidecar action. Metric emission happens only when an SDK explicitly records a typed evaluation metric into that action. PHP wires this in its companion PR; Python and Ruby do not. - Python and Ruby therefore keep exactly their current host-language OpenFeature metric writers. They are not also sending evaluation metrics through this native sidecar path. - Evaluation metrics intentionally count every evaluation and do not have exposure-cache deduplication semantics. Future Python/Ruby migration must switch ownership to native logging and disable/bypass the host metric writer for the same evaluations. Reference implementation check: dd-trace-java's canonical metric path is OpenFeature hook based. Java's `Provider` creates `FlagEvalMetrics` and returns a `FlagEvalHook`; the hook runs in `finallyAfter`, reads the final OpenFeature `FlagEvaluationDetails` including flag key, variant, reason, error code, and allocation metadata, and records one `feature_flag.evaluations` counter. Application code only calls OpenFeature; it does not call a metric API. PHP mirrors that canonical OpenFeature shape. The PHP OpenFeature provider disables raw native metric recording while it asks the native evaluator for an assignment, then records exactly one final OpenFeature-aware metric through the Datadog-owned recorder. The raw Datadog PHP client has no direct Java equivalent, but it keeps the same SDK-owned ergonomics: normal evaluation APIs record one native metric per evaluation internally. For future Python/Ruby migration, the same rule applies: either keep the existing host-language OpenFeature metric hook, or switch ownership to the native recorder and disable/bypass the host metric writer for those evaluations. ## Decisions No telemetry is emitted automatically from shared libdatadog evaluator calls. SDKs must explicitly enqueue FFE telemetry actions. This avoids double counting for Python/Ruby, which currently log feature-flag telemetry in host-language code. Evaluation metrics intentionally count evaluations and do not use exposure-cache deduplication semantics. Future Python/Ruby migration must be an ownership switch, not an additional writer. If those SDKs opt into this native metric path, their host-language OpenFeature metric writers must stop recording the same evaluations. ## Validation Current head (`96d9a7bae`) local validation: ```sh cd /Users/leo.romanovsky/go/src/github.com/DataDog/libdatadog-ffe-sidecar-metrics cargo fmt --check cargo test -p datadog-ffe --features evaluation-metrics telemetry::evaluation_metrics cargo test -p datadog-sidecar ffe_metric cargo check -p datadog-ffe cargo check -p datadog-sidecar-ffi ``` Results: datadog-ffe metric tests passed (2 passed), sidecar metric tests passed (6 passed), default datadog-ffe check passed, sidecar FFI check passed, fmt check passed with only the repo stable-rustfmt warnings. Prior downstream PHP behavior validation before the reusable-crate refactor, from DataDog/dd-trace-php#3911 using this PR at `1f1fca439`: ```text ffe-dogfooding subject=php-3911-split-1779981881 php7_metrics=3 php8_metrics=3 php7_exposures=0 php8_exposures=0 ``` System-tests downstream validation: ```sh TEST_LIBRARY=php ./run.sh FEATURE_FLAGGING_AND_EXPERIMENTATION tests/ffe/test_flag_eval_metrics.py -vv ``` Result: 17 passed in 81.26 seconds. Related PRs: DataDog/dd-trace-php#3906, DataDog/dd-trace-php#3911, #2026, DataDog/system-tests#7033. Co-authored-by: leo.romanovsky <leo.romanovsky@datadoghq.com>

## Motivation PHP FFE exposure delivery needs a native path with a cache that persists beyond a single PHP request/thread. The shared design doc is the cross-PR reference: https://docs.google.com/document/d/1NvMfTpZWLBlFmEFNjdnlMyeVpy5l7KD8qujGFco6w2w/edit?tab=t.0 This PR is exposure-only. Metrics were split into #2052 so reviewers can evaluate exposure cache and delivery separately from OTLP evaluation metrics. ## Changes This adds caller-driven FFE exposure sidecar actions, exposure payload forwarding through the Agent EVP proxy, and a shared exposure cache that deduplicates repeated `(service, env, version, flag, subject)` assignments across PHP requests and sidecar connections. The reusable FFE-domain pieces now live in `datadog-ffe` behind the `exposure-events` feature: exposure input types, the LRU deduplication cache, and JSON payload encoding. `datadog-sidecar` keeps only sidecar-specific work: deriving the agent EVP endpoint, building the HTTP request, applying the timeout, logging delivery failures, and integrating with sidecar lifecycle/actions. Current PHP MVP path: ```mermaid flowchart LR Eval["PHP native evaluation ddog_ffe_evaluate"] Batch["PHP tracer native memory request/thread-local exposure batch"] Shutdown["PHP RSHUTDOWN flush exposure batch"] Action["sidecar action record FFE exposures"] Domain["datadog-ffe feature: exposure-events types + cache + JSON encoder"] Sidecar["shared sidecar cross-request and cross-thread exposure cache"] Agent["Datadog Agent EVP proxy"] Intake["FFE exposure intake"] Eval -->|"doLog=true assignment"| Batch Batch --> Shutdown Shutdown --> Action Action --> Domain Domain --> Sidecar Sidecar --> Agent Agent --> Intake ``` Future Python/Ruby connection: ```mermaid flowchart LR PyToday["dd-trace-py today host-language exposure writer"] RbToday["dd-trace-rb today host-language exposure writer"] PyFuture["dd-trace-py future explicit native opt-in"] RbFuture["dd-trace-rb future explicit native opt-in"] Native["libdatadog caller-driven FFE exposure action"] Shared["shared sidecar dedupe + EVP delivery"] Agent["Datadog Agent EVP proxy"] PyToday -. "current direct EVP path" .-> Agent RbToday -. "current direct EVP path" .-> Agent PyFuture -. "after ownership switch" .-> Native RbFuture -. "after ownership switch" .-> Native Native --> Shared Shared --> Agent ``` The future Python/Ruby arrows are intentionally not active behavior in this PR. They show why the reusable code lives in `datadog-ffe` rather than directly in sidecar internals, while preserving today's host-language ownership. Why Python/Ruby do not double count today: - Python and Ruby use libdatadog for evaluation only; the evaluator returns assignment metadata and does not enqueue exposure telemetry as a side effect. - This PR adds a separate caller-driven sidecar action. Exposure emission happens only when an SDK explicitly records exposure candidates into that action. PHP wires this in its companion PR; Python and Ruby do not. - Python and Ruby therefore keep exactly their current host-language EVP exposure writers. They are not also sending exposure candidates through this native sidecar path. - The sidecar cache only deduplicates exposure candidates that enter the native sidecar path. It cannot protect direct host-language EVP writers, so future Python/Ruby migration must switch ownership to native logging and disable/bypass the host exposure writer for the same evaluations. Reference implementation check: dd-trace-java follows the same exposure semantics and user ergonomics. Java's `DDEvaluator` is SDK-owned evaluation code; after resolving an assignment, it checks allocation `doLog`, builds an exposure event with flag, variant, allocation, targeting key, and context, and dispatches it through `FeatureFlaggingGateway`. `ExposureWriterImpl` subscribes to those exposure events, queues them, deduplicates with an LRU exposure cache, serializes service/env/version context, and posts to the Agent EVP proxy. Application code only calls the OpenFeature provider; it does not call an exposure API. PHP mirrors that canonical shape, with PHP-specific lifecycle mechanics: the dd-trace-php evaluation bridge records `doLog=true` exposure candidates internally, request shutdown flushes the batch, and this PR's sidecar path owns cross-request deduplication and EVP delivery. For future Python/Ruby migration, the same rule applies: wire native exposure recording inside the SDK-owned evaluation path, and turn off the existing host-language exposure writer for those evaluations. ## Decisions No telemetry is emitted automatically from shared libdatadog evaluator calls. SDKs must explicitly enqueue FFE telemetry actions. This remains required for Python/Ruby coexistence because those SDKs currently log exposures and metrics in host-language code. The sidecar cache deduplicates only exposure candidates sent through this native sidecar path; it cannot deduplicate direct host-language EVP writers. Future Python/Ruby migration must be an ownership switch, not an additional writer. When those SDKs opt into this native exposure path, their host-language exposure writers must be disabled or bypassed for the same evaluations to avoid double counting. ## Validation Current head (`8be471fbc`) local validation: ```sh cd /Users/leo.romanovsky/go/src/github.com/DataDog/libdatadog-ffe-sidecar-exposures cargo fmt --check cargo test -p datadog-ffe --features exposure-events telemetry::exposures cargo test -p datadog-sidecar ffe_exposure cargo check -p datadog-ffe cargo check -p datadog-sidecar-ffi ``` Results: datadog-ffe exposure tests passed (4 passed), sidecar exposure tests passed (6 passed), default datadog-ffe check passed, sidecar FFI check passed, fmt check passed with only the repo stable-rustfmt warnings. Prior downstream PHP behavior validation before the reusable-crate refactor, from DataDog/dd-trace-php#3910 using this PR at `6d23848a`: ```text ffe-dogfooding subject=php-3910-split-1779981442 php7_exposures=1 php8_exposures=1 php7_metrics=0 php8_metrics=0 ``` System-tests downstream validation: ```sh TEST_LIBRARY=php ./run.sh FEATURE_FLAGGING_AND_EXPERIMENTATION tests/ffe/test_exposures.py -vv ``` Result: 11 passed in 77.53 seconds. Related PRs: DataDog/dd-trace-php#3906, DataDog/dd-trace-php#3910, #2052, DataDog/system-tests#7031. Co-authored-by: leo.romanovsky <leo.romanovsky@datadoghq.com>

leoromanovsky added 2 commits May 21, 2026 21:58

feat(ffe): wire runtime-backed feature flags

2b3f9e5

feat(ffe): package runtime evaluation milestone

782f622

Fix FFE fixture validation in package CI

9163839

leoromanovsky mentioned this pull request May 22, 2026

Enable PHP FFE evaluation system tests DataDog/system-tests#7003

Merged

leoromanovsky added 2 commits May 22, 2026 06:30

Fix FFE fixture submodules in child pipelines

cc6058f

Skip FFE fixture sweep under valgrind

e55ebb9

Merge remote-tracking branch 'origin/master' into leo.romanovsky/mile…

81f2206

…stone-1-runtime-evaluation

leoromanovsky marked this pull request as ready for review May 22, 2026 18:02

leoromanovsky requested review from a team as code owners May 22, 2026 18:02

leoromanovsky requested review from tabgok and removed request for a team May 22, 2026 18:02

leoromanovsky assigned morrisonlevi, bwoebi and typotter May 22, 2026

chatgpt-codex-connector Bot reviewed May 22, 2026

View reviewed changes

Comment thread src/bridge/_files_openfeature.php Outdated

Comment thread components-rs/ffe.rs

leoromanovsky added 5 commits May 22, 2026 14:34

Make FFE runtime state thread-local

4a810a2

Preload FFE API classes for OpenFeature bridge

2f46d97

Return defaults for null FFE assignments

7abe806

Avoid duplicate OpenFeature bridge preloads

eaa44fc

Validate FFE runtime through PHP system tests

4496908

leoromanovsky marked this pull request as draft May 22, 2026 18:57

leoromanovsky marked this pull request as ready for review May 22, 2026 19:04

fix: preserve empty FFE targeting keys

555e9a0

leoromanovsky mentioned this pull request May 23, 2026

Add FFE evaluation completion hook #3909

Merged

leoromanovsky requested a review from morrisonlevi May 27, 2026 15:35