-
Notifications
You must be signed in to change notification settings - Fork 479
pkg/cache/scheduler: add exclusion stats to TAS failure messages #8043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
pkg/cache/scheduler: add exclusion stats to TAS failure messages #8043
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sohankunkerkar The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances TAS (Topology Aware Scheduling) failure messages by adding detailed node exclusion statistics. When scheduling fails, users will now see why nodes were excluded (e.g., taints, nodeSelector mismatches, affinity rules, insufficient resources) along with counts for each exclusion reason.
Key Changes:
- Introduced
ExclusionStatsto track and report node exclusion reasons during TAS scheduling - Enhanced failure messages to include total node count and breakdown of exclusion reasons
- Updated 30+ test cases to validate the new detailed error messages
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pkg/cache/scheduler/tas_flavor_snapshot.go | Core implementation: Added ExclusionStats struct, limitingResource function for identifying bottleneck resources, and updated fillInCounts/notFitMessage to track and format exclusion statistics |
| pkg/scheduler/scheduler_tas_test.go | Updated 16 test cases to match new detailed failure message format with exclusion statistics |
| pkg/cache/scheduler/tas_cache_test.go | Updated 14 test cases and added 2 new test cases to validate exclusion stats formatting and resource bottleneck detection |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a9c1d19 to
f631ffb
Compare
mimowo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
/remove-kind cleanup |
e46a4e4 to
3c7ebd9
Compare
Include detailed node exclusion reasons (taints, nodeSelector, affinity, resources) in TAS scheduling failure messages to improve debuggability. Fixes: kubernetes-sigs#7854 Signed-off-by: Sohan Kunkerkar <[email protected]>
3c7ebd9 to
54180f6
Compare
Include detailed node exclusion reasons (taints, nodeSelector, affinity, resources) in TAS scheduling failure messages to improve debuggability.
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #7854
Special notes for your reviewer:
Does this PR introduce a user-facing change?