Skip to content

test(sync-service): make StackSupervisor telemetry test synchronous#4554

Draft
erik-the-implementer wants to merge 1 commit into
mainfrom
alco/fix-telemetry-test-isolation
Draft

test(sync-service): make StackSupervisor telemetry test synchronous#4554
erik-the-implementer wants to merge 1 commit into
mainfrom
alco/fix-telemetry-test-isolation

Conversation

@erik-the-implementer

Copy link
Copy Markdown
Contributor

Summary

Fixes a flaky test: Electric.StackSupervisor.TelemetryTest "per-status HTTP
request counts are scrapable".

The test starts a Prometheus reporter and asserts exact per-status counters
after emitting a fixed set of [:electric, :plug, :serve_shape] events.
Telemetry handlers are global, so the reporter's handler also counts
serve_shape events emitted by any other test running concurrently — the real
serve_shape plug emits this event on every shape request. Under async: true
this produced non-deterministic counts (e.g. status="200" observed as 3
instead of the expected 2).

Marking the module async: false makes ExUnit never schedule it concurrently
with other tests, so the exact-count assertions become deterministic. The
metric aggregates only by :status, so there's no way to scope the reporter's
global handler per-test while keeping the module async.

Context

This flake surfaced while validating the Elixir/OTP upgrade in #3992, where
OTP 29's scheduling timing made the latent contamination trip more often. It's
an independent test-isolation bug on main, so it's split out here.

🤖 Generated with Claude Code

The "per-status HTTP request counts are scrapable" test starts a
Prometheus reporter and asserts exact per-status counters after emitting
a fixed set of [:electric, :plug, :serve_shape] events. Telemetry
handlers are global, so the reporter's handler also counts serve_shape
events emitted by any other test running concurrently (the real
serve_shape plug emits this event on every shape request). Under
`async: true` that produced non-deterministic counts, e.g. status="200"
observed as 3 instead of the expected 2.

Mark the module `async: false` so ExUnit never runs it concurrently with
other tests, making the exact-count assertions deterministic. The metric
aggregates only by `:status`, so there is no way to scope the reporter's
global handler per-test while keeping the module async.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.70%. Comparing base (5238055) to head (a30860f).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4554      +/-   ##
==========================================
+ Coverage   54.85%   56.70%   +1.84%     
==========================================
  Files         318      361      +43     
  Lines       36843    39560    +2717     
  Branches    10516    11129     +613     
==========================================
+ Hits        20212    22431    +2219     
- Misses      16598    17058     +460     
- Partials       33       71      +38     
Flag Coverage Δ
packages/agents 70.53% <ø> (ø)
packages/agents-mcp 77.54% <ø> (?)
packages/agents-mobile 71.42% <ø> (ø)
packages/agents-runtime 80.26% <ø> (+0.01%) ⬆️
packages/agents-server 73.95% <ø> (ø)
packages/agents-server-ui 5.66% <ø> (ø)
packages/electric-ax 46.42% <ø> (ø)
packages/experimental 87.73% <ø> (?)
packages/react-hooks 86.48% <ø> (?)
packages/start 82.83% <ø> (?)
packages/typescript-client 91.83% <ø> (?)
packages/y-electric 56.05% <ø> (?)
typescript 56.70% <ø> (+1.84%) ⬆️
unit-tests 56.70% <ø> (+1.84%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@claude

claude Bot commented Jun 10, 2026

Copy link
Copy Markdown

Claude Code Review

Summary

A one-line test-isolation fix that switches Electric.StackSupervisor.TelemetryTest from async: true to async: false to eliminate a flaky exact-count assertion. The change is correct, minimal, and well-documented.

What's Working Well

  • Accurate root-cause diagnosis. I verified the claim: :telemetry handlers are global to the VM, and the Prometheus reporter (TelemetryMetricsPrometheus.Core) attaches a handler for [:electric, :plug, :serve_shape] (packages/electric-telemetry/lib/electric/telemetry/stack_telemetry.ex:96-97). The metric aggregates only by :status. Meanwhile packages/sync-service/test/electric/plug/serve_shape_plug_test.exs runs async: true and emits real serve_shape events via the plug. So a concurrent run would bump this reporter's status="200" counter beyond the expected 2 — exactly the symptom described. The diagnosis holds up end-to-end.
  • The explanatory comment is excellent. Future readers (and reviewers) immediately understand why the module is sync rather than rediscovering the global-handler footgun the hard way. This is the right thing to leave behind for a non-obvious async: false.
  • Correctly justified trade-off. Because the metric keys only on :status (not :stack_id), there's genuinely no per-test scoping available while keeping the handler global — so async: false is the pragmatic fix rather than a workaround. The other reporters created in this module are uniquely named and supervised, but that doesn't help, since handler counting is VM-wide. ExUnit's async: false guarantees the module never runs concurrently with the async pool, which closes the contamination window.
  • Good scope hygiene — splitting this out of the OTP-29 upgrade (chore(sync-service): upgrade Elixir to 1.20.1 #3992) as an independent main bug keeps that PR clean.

Issues Found

Critical (Must Fix)

None.

Important (Should Fix)

None.

Suggestions (Nice to Have)

  • Latent risk on the sibling tests. The other three tests in this module (total_shapes, replication, admission_control) assert exact counts/values too. They were presumably stable because no concurrent test emits those exact events into a live reporter today. async: false now protects all four regardless, so this is moot for correctness — but worth being aware that the same global-handler contamination class applies to any exact-count Prometheus assertion. No action needed here; flagging only as a pattern to watch when adding similar tests elsewhere.

Issue Conformance

No linked issue, but the PR description clearly explains the flake, the mechanism, and the context (#3992). For a self-contained test-isolation fix this is sufficient. No changeset is required: the change is test-only and doesn't affect the published sync-service artifact.

Previous Review Status

N/A — first review.


Review iteration: 1 | 2026-06-10

@alco alco marked this pull request as draft June 10, 2026 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants