Skip to content

Conversation

@mykysha
Copy link
Contributor

@mykysha mykysha commented Dec 1, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

Display more detailed information on preemption in the flavor assignment attempt block in events
Follow-up for #7646

Which issue(s) this PR fixes:

Part of #7137

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Observability: Add more details (the preemptionMode) to the QuotaReserved condition message,
and the related event, about the skipped flavors which were considered for preemption. 
Before: "Quota reserved in ClusterQueue preempt-attempts-cq, wait time since queued was 9223372037s; Flavors considered: main: on-demand(Preempt;insufficient unused quota for cpu in flavor on-demand, 1 more needed)"
After: "Quota reserved in ClusterQueue preempt-attempts-cq, wait time since queued was 9223372037s; Flavors considered: main: on-demand(preemptionMode=Preempt;insufficient unused quota for cpu in flavor on-demand, 1 more needed)"

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Dec 1, 2025
@netlify
Copy link

netlify bot commented Dec 1, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 18af99b
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/6932e8394619af00080b6a62

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 1, 2025
@mykysha
Copy link
Contributor Author

mykysha commented Dec 1, 2025

/cc @mbobrovskyi
PTAL

@mykysha mykysha force-pushed the feat/preemtion-flavor-attempts branch from a8489ac to c9777aa Compare December 2, 2025 09:18
@mimowo
Copy link
Contributor

mimowo commented Dec 2, 2025

/assign @pajakd
PTAL

@mykysha mykysha force-pushed the feat/preemtion-flavor-attempts branch 2 times, most recently from 3232203 to 8120d29 Compare December 2, 2025 10:26
Copy link
Contributor

@mbobrovskyi mbobrovskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
Thank you!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 2, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 4d82f3da79961e33f3c41b42fdb963979e244ae9

@mykysha mykysha force-pushed the feat/preemtion-flavor-attempts branch from 8120d29 to 9bb800c Compare December 2, 2025 14:38
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 2, 2025
@pajakd
Copy link
Contributor

pajakd commented Dec 3, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 3, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 2d97397e46924187e0c4372262c880809617157e

@mykysha
Copy link
Contributor Author

mykysha commented Dec 5, 2025

cc: @mimowo

@mimowo
Copy link
Contributor

mimowo commented Dec 5, 2025

@mykysha is this correct? In particular the example "Before" ?
/release-note-edit

Observability: Add more details (the preemptionMode) to the QuotaReserved condition message,
and the related event, about the skipped flavors which were considered for preemption. 
Before: "Quota reserved in ClusterQueue preempt-attempts-cq, wait time since queued was 9223372037s; Flavors considered: main: on-demand(Preempt;insufficient unused quota for cpu in flavor on-demand, 1 more needed)"
After: "Quota reserved in ClusterQueue preempt-attempts-cq, wait time since queued was 9223372037s; Flavors considered: main: on-demand(preemptionMode=Preempt;insufficient unused quota for cpu in flavor on-demand, 1 more needed)"

case reclaim:
return ptr.To(preemptioncommon.Reclaim)
default:
return nil // preemptionMode has non-preemption values that do not address preemption.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the comment TBH. Is it to say "there are no other options"?

If so I would follow the pattern as above in this function (it is confusing to see two functions side-by-side one returning panic one nil): https://github.com/kubernetes-sigs/kueue/pull/8024/files#diff-0870ec2c0ea330a7d087a6ef599905bf9b7164a0dd889d1b00ba17c31a1bfd05R429

Unless I'm missing something which might be, but then please rephrase the comment to make it more clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing from preemptioncommon.PreemptionPossibility to preemptionMode is safe, because all the PreemptionPossibility fields have their counterparts, but changing back not so as we introduce non-preemption related fit and noFit to the preemptionMode. I can change return nil to a panic message, but I'm not sure that this is needed.
Perhaps something like

case noFit, fit:
	return nil
default:
	panic(fmt.Sprintf("illegal PreemptionPossibility: %d", preemptionPossibility))

would be better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so we have preemptionPossibility -> preemptionMode -> preemptionPossibility that is weird. Would it be possible to eliminate the preemptionMode entirely?

It is not a question about this PR, but I want to get your view on that since you researched the code recently.

For this PR I would go with, I assume you meant preemptionMode rather than possibilty here:

default:
	panic(fmt.Sprintf("unexpected preemptionMode: %d", mode))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, or alternatively stay with

case noFit, fit:
	return nil
default:
	panic(fmt.Sprintf("illegal preemptionMode: %d", preemptionMode))

but remove this guard chack https://github.com/kubernetes-sigs/kueue/pull/8024/files#diff-0ea412b6476f3b5faaa7e3ad1fb3ded2e8c568df3c84474c0c576df47223c56cR72-R74

Keeping the guards at both levels makes some code dead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think preemptionMode can be modified to contain both the FlavorAssignmentMode as well as the PreemptionPossibility and we would not need to have so many conversions. Another possibility is to add missing values to the FlavorAssignmentMode, but this change would be more intrusive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm, as mentioned I'm good to keep it for future depending how the code evolves

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the guard and went with the case nofit, fit: approach. Making changes to the preemptionMode would potentially remove the conversion altogether, but that could be addressed in a separate PR

@mykysha
Copy link
Contributor Author

mykysha commented Dec 5, 2025

Before is definitely correct, but we can also change after to better show that Reclaim and NoCandidates are added, perhaps like

After: "Quota reserved in ClusterQueue preempt-attempts-cq, wait time since queued was 9223372037s; Flavors considered: main: on-demand(preemptionMode=Preempt/Reclaim/NoCandidates;insufficient unused quota for cpu in flavor on-demand, 1 more needed)"

?

@mimowo
Copy link
Contributor

mimowo commented Dec 5, 2025

sgtm, updating:

Observability: Add more details (the preemptionMode) to the QuotaReserved condition message,
and the related event, about the skipped flavors which were considered for preemption. 
Before: "Quota reserved in ClusterQueue preempt-attempts-cq, wait time since queued was 9223372037s; Flavors considered: main: on-demand(Preempt;insufficient unused quota for cpu in flavor on-demand, 1 more needed)"
After: "Quota reserved in ClusterQueue preempt-attempts-cq, wait time since queued was 9223372037s; Flavors considered: main: on-demand(preemptionMode=Preempt/Reclaim/NoCandidates;insufficient unused quota for cpu in flavor on-demand, 1 more needed)"


existingPreemption := existing.PreemptionPossibility
if at.PreemptionPossibility != nil && (existingPreemption == nil || *at.PreemptionPossibility < *existingPreemption) {
existing.PreemptionPossibility = at.PreemptionPossibility
Copy link
Contributor

@mimowo mimowo Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is error prone and confusing to introduce the variable existingPreemption, and use it only for "read" but for "update" using the underlying existing variable.

Either don't introduce the new variable just update existing, or update the local copy. Generally mutating the local copy than the original pointer is safer, but this code already updates the underlying copy, just in a tricky fashion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed here, now only the copy gets updated

@mykysha mykysha force-pushed the feat/preemtion-flavor-attempts branch from 9bb800c to 18af99b Compare December 5, 2025 14:12
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 5, 2025
@mimowo
Copy link
Contributor

mimowo commented Dec 5, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 5, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: e227c17e8643c0e5c4c38583bafa564f1e66378f

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mbobrovskyi, mimowo, mykysha

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 5, 2025
@k8s-ci-robot k8s-ci-robot merged commit 0b3b6aa into kubernetes-sigs:main Dec 5, 2025
28 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.16 milestone Dec 5, 2025
@mykysha mykysha deleted the feat/preemtion-flavor-attempts branch December 5, 2025 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants