Skip to content

test: Enable test_snapshot_pruning_removes_outdated_records#1735

Open
Pijukatel wants to merge 2 commits intomasterfrom
fix-test-snapshot-pruning
Open

test: Enable test_snapshot_pruning_removes_outdated_records#1735
Pijukatel wants to merge 2 commits intomasterfrom
fix-test-snapshot-pruning

Conversation

@Pijukatel
Copy link
Collaborator

@Pijukatel Pijukatel commented Feb 11, 2026

Description:

  • Update test_snapshot_pruning_removes_outdated_records to remove the expected source of flaky behavior
  • The root cause of the flaky test behavior is a design flaw in snapshot handling:
    event_manager.on(event=Event.SYSTEM_INFO, listener=self._snapshot_cpu) adds self._snapshot_cpu listener, but event manager runs sync listeners through asyncio.to_thread. self._snapshot_cpu modifies in-place instance list (for example self._cpu_snapshots) - it does bisect.insort and del operations on the same list from several threads, which creates oportunity for a race condition.

Issues:

Closes: #1734

@github-actions github-actions bot added this to the 134th sprint - Tooling team milestone Feb 11, 2026
@github-actions github-actions bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Feb 11, 2026
@codecov
Copy link

codecov bot commented Feb 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.42%. Comparing base (309fb38) to head (63e8468).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1735      +/-   ##
==========================================
- Coverage   92.50%   92.42%   -0.08%     
==========================================
  Files         156      156              
  Lines       10602    10602              
==========================================
- Hits         9807     9799       -8     
- Misses        795      803       +8     
Flag Coverage Δ
unit 92.42% <100.00%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Pijukatel Pijukatel requested a review from vdusek February 11, 2026 10:42
@Pijukatel Pijukatel marked this pull request as ready for review February 11, 2026 10:43
Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright let's see :)

Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FAILED tests/unit/_autoscaling/test_snapshotter.py::test_snapshot_pruning_removes_outdated_records - assert 3 == 2

  • where 3 = len([CpuSnapshot(used_ratio=0.5, max_used_ratio=0.95, created_at=datetime.datetime(2026, 2, 11, 7, 39, 32, 789908, tzinfo=datetime.timezone.utc)), CpuSnapshot(used_ratio=0.5, max_used_ratio=0.95, created_at=datetime.datetime(2026, 2, 11, 9, 39, 32, 789908, tzinfo=datetime.timezone.utc)), CpuSnapshot(used_ratio=0.5, max_used_ratio=0.95, created_at=datetime.datetime(2026, 2, 11, 10, 39, 32, 789908, tzinfo=datetime.timezone.utc))])
    ============ 1 failed, 1615 passed, 8 skipped in 345.52s (0:05:45) =============
    Error: Process completed with exit code 1.

@Pijukatel Pijukatel force-pushed the fix-test-snapshot-pruning branch from d7783c5 to 63e8468 Compare February 11, 2026 15:20
@Pijukatel
Copy link
Collaborator Author

@Mantisus , @vdusek, please check the root cause in the issue description. The suggested fix should be ok, but I fear it is too implicit and relies on EventManager non-obvious behavior. Would you be in favor of a more explicit(probably more complex) solution that would involve explicit threading locks that would prevent such a race condition?

@Mantisus
Copy link
Collaborator

Mantisus commented Feb 11, 2026

@Mantisus, @vdusek, please check the root cause in the issue description.

The root cause hypothesis seems incorrect. If the cause was an error processing in the 2-hour snapshot, then there would be 1 snapshot in the failed tests, not 3.

It might be worth trying

for event_data in events_data:
    event_manager.emit(event=Event.SYSTEM_INFO, event_data=event_data)
    await event_manager.wait_for_all_listeners_to_complete()

Perhaps the reason is that event_manager generates events out of chronological order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test: test_snapshot_pruning_removes_outdated_records fails with assert 3 == 2

3 participants