pkg/controller: Fix workload deletion during StatefulSet queue name change #8040

sohankunkerkar · 2025-12-02T17:50:17Z

When changing the queue name on a StatefulSet, the webhook propagates the change to the pod template, triggering a rolling update. Previously, the workload was only owned by pods, so when all pods were deleted during the rolling update, Kubernetes GC would delete the workload. This fix adds the StatefulSet as an additional owner of the workload,
preventing GC from deleting it when pods are replaced. This is similar to how LeaderWorkerSet sets itself as the owner of the workload.

What type of PR is this?

/kind flake

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #6883

Special notes for your reviewer:

I reproduced this locally while working on some other issue (not related to Statefulset at all)

Does this PR introduce a user-facing change?

None

k8s-ci-robot · 2025-12-02T17:50:24Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sohankunkerkar
Once this PR has been reviewed and has the lgtm label, please assign gabesaba for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

netlify · 2025-12-02T17:50:28Z

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Name	Link
🔨 Latest commit	`f1f48ba`
🔍 Latest deploy log	https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/6930d910172c74000832e2f6

sohankunkerkar · 2025-12-02T19:17:39Z

Without this change, you can reproduce this issue by running this following command:

$ make test-e2e GINKGO_ARGS='--focus=ReadyReplicas=0 --repeat=50'

I saw this in the logs: (I'm going to parse out the details)

  # Pod created and workload created with INVALID queue
  18:51:55.880  Workload create event  queue="sts-lq-invalid"
  18:51:55.880  localQueue doesn't exist or inactive

  # Test updates queue label on StatefulSet (at 18:51:56.821)

  # Pod gets suspended because workload not yet admitted
  18:51:56.875  Running job is not admitted by a cluster queue, suspending
  18:51:56.901  Not admitted by cluster queue  reason="Stopped"

  # Queue updated and workload admitted
  18:51:56.953  Workload update event  queue="sts-lq"  prevQueue="sts-lq-invalid"
  18:51:56.978  Workload update event  queue="sts-lq"  status="admitted"  clusterQueue="sts-cq"
  18:51:56.979  Job admitted, unsuspending

  # Workload gets DELETED right after being admitted
  18:51:57.000  Workload delete event  queue="sts-lq"  status="admitted"
  18:51:57.000  job with no matching workload, suspending
  18:51:57.042  Missing Workload; unable to restore pod templates  reason="Stopped"

  # New workload created, but pods already in bad state
  18:51:57.068  Workload create event  queue="sts-lq"  status="pending"
  18:51:57.111  Workload update event  queue="sts-lq"  status="admitted"
  18:51:59.055  Job admitted, unsuspending

  # Test times out waiting for ReadyReplicas=3 (stays at 0)

sohankunkerkar · 2025-12-02T23:11:15Z

It needs more testing
/hold

sohankunkerkar · 2025-12-03T03:14:48Z

After investigating, I suspect the flake is caused by a Workload being deleted and immediately recreated after admission. Tracing the deletion path points to FindMatchingWorkloads in pkg/controller/jobs/pod/pod_controller.go, which maps Workloads to Pods.
It seems to be a race during StatefulSet rolling updates: when the queue name changes, the controller updates the Workload’s expected RoleHash before the new Pods appear in the informer cache. In that gap, FindMatchingWorkloads sees the updated Workload but only the old Pods. Because the code checks len(keptPods) == 0, it treats the Workload as invalid and deletes it.

I tested locally and found that removing the len(keptPods) == 0 branch preserves the Workload during the transition, eliminating the delete–recreate loop. I want to understand why this check was added and what the impact of removing it might be.

test/e2e/singlecluster/statefulset_test.go

Copilot

Pull request overview

This PR addresses a race condition in StatefulSet reconciliation by ensuring the StatefulSet is added as an owner of its workload early in the reconciliation process. Workloads for StatefulSets are created by the Pod controller when pods are created, and initially only have pod owner references. Without the StatefulSet as an owner, workloads could be garbage collected during rolling updates when pods are replaced, particularly during queue label updates. The fix refactors the reconciler to fetch the StatefulSet once and call ensureWorkloadOwnership before finalizing pods.

Key changes:

Added ensureWorkloadOwnership function to add StatefulSet as workload owner early in reconciliation
Refactored fetchAndFinalizePods to accept StatefulSet parameter instead of fetching it internally
Removed duplicate StatefulSet fetch logic from fetchAndFinalizePods

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/controller/jobs/statefulset/statefulset_reconciler.go

When changing the queue name on a StatefulSet, the webhook propagates the change to the pod template, triggering a rolling update. Previously, the workload was only owned by pods, so when all pods were deleted during the rolling update, Kubernetes GC would delete the workload. This fix adds the StatefulSet as an additional owner of the workload, preventing GC from deleting it when pods are replaced. This is similar to how LeaderWorkerSet sets itself as the owner of the workload. Signed-off-by: Sohan Kunkerkar <[email protected]>

sohankunkerkar · 2025-12-04T03:13:13Z

/test pull-kueue-test-e2e-customconfigs-main

mbobrovskyi · 2025-12-04T03:29:42Z

We already have this fix #4799. Still waiting for review from @mimowo.

mimowo · 2025-12-04T06:37:51Z

@mbobrovskyi is this PR also fixing the issue? It looks simpler, and like the commentary and logging here helping me to understand. Part of the reason I haven't been able to quickly review the previous is complexity.

So maybe we can all work together on identifying the similarities and differences and picking up the best of both PRs.

I will try to find time to review today or tomorrow.

mimowo · 2025-12-05T12:06:49Z

@sohankunkerkar @mbobrovskyi I'm going to review the PRs today and decide what next. But it seems they both address #6883 so please mark this as "Fixed" in the PR descriptions

mimowo · 2025-12-05T16:16:39Z

/close
I decided to go with #4799, as it is more mature and with tests. It also takes a bit more care about parallel operations. In any case, thank you for the effort. It makes me confident in that way of fixing it.

k8s-ci-robot · 2025-12-05T16:16:47Z

@mimowo: Closed this PR.

In response to this:

/close
I decided to go with #4799, as it is more mature and with tests. It also takes a bit more care about parallel operations. In any case, thank you for the effort. It makes me confident in that way of fixing it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/flake Categorizes issue or PR as related to a flaky test. labels Dec 2, 2025

k8s-ci-robot requested a review from PBundyra December 2, 2025 17:50

k8s-ci-robot requested a review from tenzen-y December 2, 2025 17:50

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 2, 2025

sohankunkerkar changed the title ~~test/e2e: fix race in StatefulSet queue name update test~~ [WIP] test/e2e: fix race in StatefulSet queue name update test Dec 2, 2025

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 2, 2025

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 2, 2025

mbobrovskyi reviewed Dec 3, 2025

View reviewed changes

test/e2e/singlecluster/statefulset_test.go Outdated Show resolved Hide resolved

sohankunkerkar force-pushed the fix-statefulset-flake branch from 363d637 to 861c464 Compare December 3, 2025 22:54

Copilot AI review requested due to automatic review settings December 3, 2025 22:54

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 3, 2025

Copilot started reviewing on behalf of sohankunkerkar December 3, 2025 22:54 View session

sohankunkerkar changed the title ~~[WIP] test/e2e: fix race in StatefulSet queue name update test~~ pkg/controller: Fix workload deletion during StatefulSet queue name change Dec 3, 2025

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 3, 2025

Copilot finished reviewing on behalf of sohankunkerkar December 3, 2025 22:57

Copilot AI reviewed Dec 3, 2025

View reviewed changes

pkg/controller/jobs/statefulset/statefulset_reconciler.go Show resolved Hide resolved

pkg/controller/jobs/statefulset/statefulset_reconciler.go Show resolved Hide resolved

sohankunkerkar force-pushed the fix-statefulset-flake branch from 861c464 to f27a613 Compare December 3, 2025 23:19

sohankunkerkar force-pushed the fix-statefulset-flake branch from f27a613 to f1f48ba Compare December 4, 2025 00:42

k8s-ci-robot closed this Dec 5, 2025

pkg/controller: Fix workload deletion during StatefulSet queue name change #8040

pkg/controller: Fix workload deletion during StatefulSet queue name change #8040

Conversation

sohankunkerkar commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

k8s-ci-robot commented Dec 2, 2025

Uh oh!

netlify bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Uh oh!

sohankunkerkar commented Dec 2, 2025

Uh oh!

sohankunkerkar commented Dec 2, 2025

Uh oh!

sohankunkerkar commented Dec 3, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

sohankunkerkar commented Dec 4, 2025

Uh oh!

mbobrovskyi commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mimowo commented Dec 4, 2025

Uh oh!

mimowo commented Dec 5, 2025

Uh oh!

mimowo commented Dec 5, 2025

Uh oh!

k8s-ci-robot commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sohankunkerkar commented Dec 2, 2025 •

edited

Loading

netlify bot commented Dec 2, 2025 •

edited

Loading

mbobrovskyi commented Dec 4, 2025 •

edited

Loading