-
Notifications
You must be signed in to change notification settings - Fork 479
pkg/controller: Fix workload deletion during StatefulSet queue name change #8040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/controller: Fix workload deletion during StatefulSet queue name change #8040
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sohankunkerkar The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
|
Without this change, you can reproduce this issue by running this following command: $ make test-e2e GINKGO_ARGS='--focus=ReadyReplicas=0 --repeat=50'I saw this in the logs: (I'm going to parse out the details) # Pod created and workload created with INVALID queue
18:51:55.880 Workload create event queue="sts-lq-invalid"
18:51:55.880 localQueue doesn't exist or inactive
# Test updates queue label on StatefulSet (at 18:51:56.821)
# Pod gets suspended because workload not yet admitted
18:51:56.875 Running job is not admitted by a cluster queue, suspending
18:51:56.901 Not admitted by cluster queue reason="Stopped"
# Queue updated and workload admitted
18:51:56.953 Workload update event queue="sts-lq" prevQueue="sts-lq-invalid"
18:51:56.978 Workload update event queue="sts-lq" status="admitted" clusterQueue="sts-cq"
18:51:56.979 Job admitted, unsuspending
# Workload gets DELETED right after being admitted
18:51:57.000 Workload delete event queue="sts-lq" status="admitted"
18:51:57.000 job with no matching workload, suspending
18:51:57.042 Missing Workload; unable to restore pod templates reason="Stopped"
# New workload created, but pods already in bad state
18:51:57.068 Workload create event queue="sts-lq" status="pending"
18:51:57.111 Workload update event queue="sts-lq" status="admitted"
18:51:59.055 Job admitted, unsuspending
# Test times out waiting for ReadyReplicas=3 (stays at 0) |
|
It needs more testing |
|
After investigating, I suspect the flake is caused by a Workload being deleted and immediately recreated after admission. Tracing the deletion path points to I tested locally and found that removing the len(keptPods) == 0 branch preserves the Workload during the transition, eliminating the delete–recreate loop. I want to understand why this check was added and what the impact of removing it might be. |
363d637 to
861c464
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses a race condition in StatefulSet reconciliation by ensuring the StatefulSet is added as an owner of its workload early in the reconciliation process. Workloads for StatefulSets are created by the Pod controller when pods are created, and initially only have pod owner references. Without the StatefulSet as an owner, workloads could be garbage collected during rolling updates when pods are replaced, particularly during queue label updates. The fix refactors the reconciler to fetch the StatefulSet once and call ensureWorkloadOwnership before finalizing pods.
Key changes:
- Added
ensureWorkloadOwnershipfunction to add StatefulSet as workload owner early in reconciliation - Refactored
fetchAndFinalizePodsto accept StatefulSet parameter instead of fetching it internally - Removed duplicate StatefulSet fetch logic from
fetchAndFinalizePods
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
861c464 to
f27a613
Compare
When changing the queue name on a StatefulSet, the webhook propagates the change to the pod template, triggering a rolling update. Previously, the workload was only owned by pods, so when all pods were deleted during the rolling update, Kubernetes GC would delete the workload. This fix adds the StatefulSet as an additional owner of the workload, preventing GC from deleting it when pods are replaced. This is similar to how LeaderWorkerSet sets itself as the owner of the workload. Signed-off-by: Sohan Kunkerkar <[email protected]>
f27a613 to
f1f48ba
Compare
|
/test pull-kueue-test-e2e-customconfigs-main |
|
@mbobrovskyi is this PR also fixing the issue? It looks simpler, and like the commentary and logging here helping me to understand. Part of the reason I haven't been able to quickly review the previous is complexity. So maybe we can all work together on identifying the similarities and differences and picking up the best of both PRs. I will try to find time to review today or tomorrow. |
|
@sohankunkerkar @mbobrovskyi I'm going to review the PRs today and decide what next. But it seems they both address #6883 so please mark this as "Fixed" in the PR descriptions |
|
/close |
|
@mimowo: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
When changing the queue name on a StatefulSet, the webhook propagates the change to the pod template, triggering a rolling update. Previously, the workload was only owned by pods, so when all pods were deleted during the rolling update, Kubernetes GC would delete the workload. This fix adds the StatefulSet as an additional owner of the workload,
preventing GC from deleting it when pods are replaced. This is similar to how LeaderWorkerSet sets itself as the owner of the workload.
What type of PR is this?
/kind flake
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #6883
Special notes for your reviewer:
I reproduced this locally while working on some other issue (not related to Statefulset at all)
Does this PR introduce a user-facing change?