feat(webapp): add RUNTIME_API_ORIGIN to decouple runner traffic from external origin#3686
feat(webapp): add RUNTIME_API_ORIGIN to decouple runner traffic from external origin#3686ThullyoCunha wants to merge 5 commits into
Conversation
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR adds a new optional Estimated code review effort🎯 1 (Trivial) | ⏱️ ~4 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Devin Review found 1 new potential issue.
🐛 1 issue in files not directly in the diff
🐛 RUNTIME_API_ORIGIN not passed through in docker-compose.yml, making the feature inoperative for Docker self-hosting (hosting/docker/webapp/docker-compose.yml:45)
The env.server.ts comment at line 136-137 explicitly references ${RUNTIME_API_ORIGIN:-} passthroughs in docker-compose.yml, and the .env.example documents RUNTIME_API_ORIGIN as a user-configurable option (hosting/docker/.env.example:57). However, hosting/docker/webapp/docker-compose.yml does not include RUNTIME_API_ORIGIN in its environment: section (lines 42-85). Docker Compose only passes env vars to containers that are explicitly listed in the environment: block — values from .env are used for variable substitution in the compose file, not automatically forwarded as container env vars. As a result, Docker self-hosting users who set RUNTIME_API_ORIGIN in their .env file will find it has no effect; the webapp container never receives the value, and runners will continue using API_ORIGIN/APP_ORIGIN. The Helm chart (hosting/k8s/helm/templates/webapp.yaml:189-192) correctly handles this, but the Docker path is broken.
View 4 additional findings in Devin Review.
ebca943 to
eb32a7f
Compare
…external origin
The webapp publishes `API_ORIGIN` to runner pods as `TRIGGER_API_URL`, so
runner-to-webapp traffic flows back through whatever URL is configured for
external clients. Self-hosting behind a tracing-enabled gateway (Envoy,
Istio, kgateway, ...) breaks the parent->child run link in trigger.dev's
run-detail tree because the gateway's W3C `traceparent` rewrite on egress
overwrites the SDK's `triggerAndWait()` span id. The webapp then writes
that gateway-generated span id as the child run's `parentSpanId`, which
never reaches the trigger event store, so the child renders as an orphan
in the UI.
Operators can split the two concerns without sacrificing external auth/
callbacks/UI flows that rely on the public `API_ORIGIN`:
- Set `RUNTIME_API_ORIGIN=http://<service>.<namespace>:<port>` (k8s) or
`http://webapp:3000` (docker) to keep runner->webapp traffic on a
cluster-internal hop that bypasses the gateway.
- Leave `API_ORIGIN` on the public URL so the dashboard, magic-link
emails, waitpoint callbacks, and API `apiUrl` responses keep working
for external clients.
Scope is intentionally limited to MANAGED (deployed) runs. Dev CLI runs
keep the original `API_ORIGIN`/`APP_ORIGIN` chain so a developer running
`trigger.dev dev` from outside the cluster does not lose connectivity.
`STREAM_ORIGIN` is still honored as a dedicated stream endpoint when set;
`RUNTIME_API_ORIGIN` takes precedence over it for `TRIGGER_STREAM_URL`
so the bypass keeps streams on the same internal hop by default.
The new env is optional and falls back to `API_ORIGIN`/`APP_ORIGIN`, so
existing deployments are unaffected. An empty string is normalized to
`undefined` in the zod schema so blank `${RUNTIME_API_ORIGIN:-}`
passthroughs from caller environments do not short-circuit the fallback
chain. Helm chart and Docker Compose are wired to forward the value to
the webapp container.
Refs: triggerdotdev#2821
eb32a7f to
da151f6
Compare
yashs33244
left a comment
There was a problem hiding this comment.
Nice fix - I've hit the gateway-rewriting-traceparent problem on a self-hosted setup too, and routing the runner API base via an in-cluster origin is the right lever (the SDK reads TRIGGER_API_URL as its client base in apiClientManager, so this lands exactly on the triggerAndWait request carrying the traceparent). I like that it's optional and backward-compatible, the (v) => v || undefined transform neutralizes the empty ${RUNTIME_API_ORIGIN:-} passthrough cleanly, and leaving dev/CLI runs on the existing chain is the correct call. The before/after span evidence in the description is great.
Two small things and one question:
-
The
TRIGGER_STREAM_URLprecedence putsRUNTIME_API_ORIGINahead ofSTREAM_ORIGIN, so an operator who wants an in-cluster runner-API hop and a separate stream endpoint can't have both for managed runners. I see your.env.examplenote already documents the "keep it unset, use STREAM_ORIGIN" workaround, so this looks intentional - could you add a one-line note in the PR body / code on why stream follows the runner origin first? Worth being explicit since the obvious reorder (STREAM_ORIGIN ?? RUNTIME_API_ORIGIN) would re-expose the stream path to the same gateway, so the current order is defensible - just non-obvious. -
Might be worth a
RUNTIME_API_ORIGINrow indocs/self-hosting/env/webapp.mdx(the Domains & ports table) - that's where self-hosters will look for it, and it's the one surface the PR doesn't touch yet.
Optional: a small unit test on resolveBuiltInProdVariables asserting the new precedence + the empty-string-falls-through behavior. Totally understand if you skip it given there's no existing harness for that path and you've verified end-to-end on a real cluster.
Either way this is a clean, well-scoped change - happy to see it.
| { | ||
| key: "TRIGGER_STREAM_URL", | ||
| value: env.STREAM_ORIGIN ?? env.API_ORIGIN ?? env.APP_ORIGIN, | ||
| value: env.RUNTIME_API_ORIGIN ?? env.STREAM_ORIGIN ?? env.API_ORIGIN ?? env.APP_ORIGIN, |
There was a problem hiding this comment.
Here RUNTIME_API_ORIGIN wins over an explicitly-set STREAM_ORIGIN for managed runners, making them mutually exclusive. Looks deliberate (and the reorder would re-expose the stream path to the gateway this bypasses), but a one-line comment on why stream follows the runner origin first would make the precedence easy to accept at a glance.
Closes #2821
✅ Checklist
Summary
The webapp publishes
API_ORIGINto runner pods asTRIGGER_API_URL, so runner-to-webapp traffic flows back through whatever URL is configured for external clients. Self-hosting behind a tracing-enabled gateway (Envoy, Istio, kgateway, ...) breaks the parent->child run link in trigger.dev's run-detail tree because the gateway's W3Ctraceparentrewrite on egress overwrites the SDK'striggerAndWait()span id. The webapp then writes that gateway-generated span id as the child run'sparentSpanId, which never reaches the trigger event store, so the child renders as an orphan in the UI.We hit this on our self-hosted v4 cluster running kgateway with tracing enabled (
spawnUpstreamSpan: true). Reproduced with three rounds of SDK debug instrumentation (capturing the active span atpropagation.inject, the wireundici:request:headerspayload, and the value the webapp receives at the route level) plus a direct curl bypassing each hop until we isolated kgateway as the rewriter. Details on the investigation are in #2821 — several self-hosted users report the same symptom.This PR splits the two concerns without sacrificing external auth/callbacks/UI flows that rely on the public
API_ORIGIN:RUNTIME_API_ORIGIN=http://<service>.<namespace>:<port>(k8s) orhttp://webapp:3000(docker) to keep runner->webapp traffic on a cluster-internal hop that bypasses the gateway.API_ORIGINon the public URL so the dashboard, magic-link emails, waitpoint callbacks, and APIapiUrlresponses keep working for external clients.The new env is optional and falls back to
API_ORIGIN/APP_ORIGIN, so existing deployments are unaffected.Changes
apps/webapp/app/env.server.ts: new optionalRUNTIME_API_ORIGINenv.apps/webapp/app/v3/environmentVariables/environmentVariablesRepository.server.ts: preferRUNTIME_API_ORIGINwhen resolvingTRIGGER_API_URL/TRIGGER_STREAM_URLfor both dev and prod runner pods, falling back to the existing chain.hosting/k8s/helm/values.yaml+templates/webapp.yaml: exposewebapp.runtimeApiOrigin(defaults to empty -> existing behavior).hosting/docker/webapp/docker-compose.yml+hosting/docker/.env.example: same opt-in for docker self-host.Testing
Validated on our self-hosted v4 staging cluster (kgateway + Envoy tracing).
Before (runners going through public URL, gateway rewrites
traceparent):triggerAndWait()wrapper spanId:070bcfdd63b42d2a(confirmed via wire-levelundici:request:headersdebug)traceparentwith spanId:b8298ebb884ade7e(gateway-rewritten)TaskRun.parentSpanId = b8298ebb...(orphan: never in event store)After (with
apiOriginpointed at the in-cluster service):66fa71fda94ccdb966fa71fda94ccdb9unchangedTaskRun.parentSpanId = 66fa71fda94ccdb9matches the parent'striggerAndWait()event inTaskEventtableEnd-to-end test task at https://github.com/meistrari/trigger-self-tests/blob/main/src/tasks/link-test.ts (parent does
linkTestChild.triggerAndWait(...)).Changelog
webapp: add optionalRUNTIME_API_ORIGINenv to advertise a runner-only API origin separate fromAPI_ORIGIN. Lets self-hosted operators route runner-to-webapp traffic cluster-internally, bypassing tracing-enabled gateways that rewrite the W3Ctraceparentheader on egress and break parent-to-child run linkage in the trace tree. Optional and backward-compatible (falls back to existingAPI_ORIGIN/APP_ORIGIN).hosting/k8s/helm+hosting/docker: expose the new env viawebapp.runtimeApiOrigin(Helm) andRUNTIME_API_ORIGIN(docker-compose).Screenshots
N/A — no UI change. The visible effect is the run-detail tree correctly showing child task runs nested under their parent's
triggerAndWait()span (matching trigger.dev SaaS behavior).💯