Nodeset rabbitmquser finalizer management and status tracking via configmap by lmiccini · Pull Request #1781 · openstack-k8s-operators/openstack-operator

lmiccini · 2026-01-27T16:35:07Z

Depends-on: openstack-k8s-operators/data-plane-adoption#1222

openshift-ci · 2026-01-27T16:35:19Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: lmiccini
Once this PR has been reviewed and has the lgtm label, please assign rabi for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

softwarefactory-project-zuul · 2026-01-27T19:55:24Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/56ac80bd0e7547ad88350eb0206886b5

✔️ openstack-k8s-operators-content-provider SUCCESS in 3h 18m 47s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 23m 38s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 37m 31s
❌ adoption-standalone-to-crc-ceph-provider FAILURE in 3h 01m 55s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 51m 23s
❌ openstack-operator-docs-preview POST_FAILURE in 2m 32s

api/core/v1beta1/openstackcontrolplane_webhook.go

softwarefactory-project-zuul · 2026-02-02T23:02:48Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/db62c9cd33b34a538c7eccf243769b6a

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 02m 26s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 20m 56s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 36m 03s
❌ adoption-standalone-to-crc-ceph-provider FAILURE in 1h 46m 57s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 34m 08s
❌ openstack-operator-docs-preview POST_FAILURE in 3m 15s

softwarefactory-project-zuul · 2026-02-07T21:18:03Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/b5d3972863e64857b2da5055f867ef55

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 20m 43s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 21m 41s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 36m 22s
❌ adoption-standalone-to-crc-ceph-provider FAILURE in 2h 05m 30s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 43m 01s
✔️ openstack-operator-docs-preview SUCCESS in 3m 14s

lmiccini · 2026-02-08T06:33:06Z

/retest

lmiccini · 2026-02-08T09:20:21Z

recheck

lmiccini · 2026-02-08T09:55:32Z

/test openstack-operator-build-deploy-kuttl-4-18

lmiccini · 2026-02-12T07:21:29Z

/test functional

Add dataplane-specific logic to track and manage RabbitMQ user finalizers for OpenStackDataPlaneNodeSet services, enabling safe credential rotation across multi-cluster deployments. Key features: - Per-nodeset finalizers on shared RabbitMQ users - Incremental deployment support with proper finalizer timing - Nova-operator rabbitmq_user_name field integration for simplified tracking - Automatic cleanup of temporary cleanup-blocked finalizers - Comprehensive test coverage for rotation and multi-cluster scenarios Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

lmiccini · 2026-02-13T06:13:20Z

/test openstack-operator-build-deploy-kuttl-4-18

lmiccini · 2026-02-13T12:54:01Z

/test openstack-operator-build-deploy-kuttl-4-18

openshift-ci · 2026-02-13T15:49:50Z

@lmiccini: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/openstack-operator-build-deploy-kuttl	`1698305`	link	true	`/test openstack-operator-build-deploy-kuttl`
ci/prow/openstack-operator-build-deploy-kuttl-4-18	`b1d9350`	link	true	`/test openstack-operator-build-deploy-kuttl-4-18`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

slagle · 2026-02-17T20:31:39Z

Is preventing the deletion of in use rabbitmq users the point of this PR? Why do we need these finalizers to enable "safe rotation"?

I'm concerned about the size and complexity of this PR. Personally, this is difficult to review. We might want to come up with a simpler design that we code without AI, and then let AI build on top of that. I'm having a hard time reasoning about all the different changes here.

This also adds some service specific code to the dataplane (nova, neutron, ironic). While we have some instances of that, we have really tried to avoid that in the past, and do things generically and let CRD fields drive the generic code.

I'm just brainstorming, but a simpler solution might be:

We know the Secret/ConfigMaps in use at service deployment time.
Services have a field whose value we use to inspect the Secret/ConfigMap and we save the value found (such as transportURL) on the NodeSet or Deployment Status when the Deployment succeeds
rabbitmq user deletion checks NodeSet or Deployment Status and if it find that user in use, blocks the deletion.

For example, the nova Service has in the spec:

serviceTrackingFields:

dataSource: # ConfigMapRef or SecretRef
fieldPattern: "nova-transport-url-pattern"

Then during Service Deployment, there is similar logic to GetNovaCellRabbitMqUserFromSecret, we get the value of the user and save it on the NodeSet and/or Deployment Status. If we attempt to rotate or delete the user, and that user is still set on a Status, the operation is blocked.

I would also delay solving the problem of enforcing that all nodes in the nodeset have been updated by a Deployment. This is a wider problem that should be solved separately from the user rotation problem.

slagle

See previous comment

slagle · 2026-02-17T20:35:44Z

Or even simpler...we already have the Secret and ConfigMap hashes saved in the Deployment statuses. If the rabbitmq user rotation see that those hashes are out of date, the rotation, or at least the old user deletion part of the rotation is blocked.

lmiccini · 2026-02-18T09:04:34Z

Is preventing the deletion of in use rabbitmq users the point of this PR? Why do we need these finalizers to enable "safe rotation"?

I'm concerned about the size and complexity of this PR. Personally, this is difficult to review. We might want to come up with a simpler design that we code without AI, and then let AI build on top of that. I'm having a hard time reasoning about all the different changes here.

This also adds some service specific code to the dataplane (nova, neutron, ironic). While we have some instances of that, we have really tried to avoid that in the past, and do things generically and let CRD fields drive the generic code.

I'm just brainstorming, but a simpler solution might be:
* We know the Secret/ConfigMaps in use at service deployment time.

* Services have a field whose value we use to inspect the Secret/ConfigMap and we save the value found (such as transportURL) on the NodeSet or Deployment Status when the Deployment succeeds

* rabbitmq user deletion checks NodeSet or Deployment Status and if it find that user in use, blocks the deletion.
For example, the nova Service has in the spec:

serviceTrackingFields:
* dataSource:  # ConfigMapRef or SecretRef
  fieldPattern: "nova-transport-url-pattern"
Then during Service Deployment, there is similar logic to GetNovaCellRabbitMqUserFromSecret, we get the value of the user and save it on the NodeSet and/or Deployment Status. If we attempt to rotate or delete the user, and that user is still set on a Status, the operation is blocked.

I would also delay solving the problem of enforcing that all nodes in the nodeset have been updated by a Deployment. This is a wider problem that should be solved separately from the user rotation problem.

Thanks @slagle , appreciate you taking the time.
The logic is more or less what you are proposing here.
We add finalizers to the rabbitmq users so that each service can "signal" they are in use, and do garbage collection when no finalizer is present, following the same pattern that we use in other places, to avoid having leftover credentials that could pose a security risk.

The additional stuff "on top" is required because we could have different rabbitmq users for nova_compute, neutron and ironic agents running in the dataplane, so I try to track which node in a nodeset ran a deployment for the aforementioned services and store that in a configmap that we update until all have reconciled to the hashes that you mention in the last comment. Here how it could look like:

[zuul@localhost ~]$ oc get configmap openstack-edpm-ipam-service-tracking -o yaml
apiVersion: v1
data:
  neutron.secretHash: 6e657574726f6e2d646863702d6167656e742d6e657574726f6e2d636f6e6669673a313737303632353235383b6e657574726f6e2d7372696f762d6167656e742d6e657574726f6e2d636f6e6669673a313737303632353235383b
  neutron.updatedNodes: '[]'
  nova.secretHash: 6e6f76612d63656c6c312d636f6d707574652d636f6e6669673a313737303634333733313b
  nova.updatedNodes: '["edpm-compute-0","edpm-compute-1"]'

If I understand correctly you would like to flip this around and have infra-operator track each nodeset rabbitmq usage instead? Not sure having infra-operator introspect dataplane objects is my preferred approach, especially because we have no way of knowing if one additional service will be added tomorrow that could use rabbitmq, so we would have to play catch up with the dataplane. That said, I can try to prototype something and see how ugly it gets.
Thanks again.

lmiccini requested review from fmount and stuggi January 27, 2026 16:35

lmiccini added do-not-merge/work-in-progress do-not-merge/hold labels Jan 27, 2026

openshift-ci bot removed the do-not-merge/work-in-progress label Jan 27, 2026

openshift-ci bot requested review from abays and dprince January 27, 2026 16:35

stuggi requested a review from slagle January 28, 2026 08:13

auniyal61 requested changes Jan 30, 2026

View reviewed changes

api/core/v1beta1/openstackcontrolplane_webhook.go Outdated Show resolved Hide resolved

openshift-ci bot assigned auniyal61 Jan 30, 2026

openshift-merge-robot added the needs-rebase label Jan 30, 2026

lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch from 04cc55a to 1698305 Compare February 2, 2026 20:59

openshift-merge-robot removed the needs-rebase label Feb 2, 2026

SeanMooney mentioned this pull request Feb 6, 2026

Rabbitmq vhost and user support openstack-k8s-operators/nova-operator#1052

Merged

lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch 2 times, most recently from 3885c4a to c1fe8f8 Compare February 7, 2026 18:56

lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch 2 times, most recently from cbfbb7c to f52529a Compare February 8, 2026 15:01

lmiccini removed the do-not-merge/hold label Feb 10, 2026

lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch from f52529a to 017d2ca Compare February 10, 2026 06:41

lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch from 017d2ca to 97bb482 Compare February 12, 2026 09:50

lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch from 97bb482 to b1d9350 Compare February 12, 2026 15:46

slagle requested changes Feb 17, 2026

View reviewed changes

openshift-ci bot assigned slagle Feb 17, 2026

Conversation

lmiccini commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Jan 27, 2026

Uh oh!

softwarefactory-project-zuul bot commented Jan 27, 2026

Uh oh!

Uh oh!

softwarefactory-project-zuul bot commented Feb 2, 2026

Uh oh!

softwarefactory-project-zuul bot commented Feb 7, 2026

Uh oh!

lmiccini commented Feb 8, 2026

Uh oh!

lmiccini commented Feb 8, 2026

Uh oh!

lmiccini commented Feb 8, 2026

Uh oh!

lmiccini commented Feb 12, 2026

Uh oh!

lmiccini commented Feb 13, 2026

Uh oh!

lmiccini commented Feb 13, 2026

Uh oh!

openshift-ci bot commented Feb 13, 2026

Uh oh!

slagle commented Feb 17, 2026

Uh oh!

slagle left a comment

Choose a reason for hiding this comment

Uh oh!

slagle commented Feb 17, 2026

Uh oh!

lmiccini commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lmiccini commented Jan 27, 2026 •

edited

Loading

lmiccini commented Feb 18, 2026 •

edited

Loading