-
Notifications
You must be signed in to change notification settings - Fork 34
Add resyncPeriod option for drift detection #658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That great, it looks relatively straightforward to implement. Some points to consider:
- we'll need docs, and highly visible warnings about the side effects of enabling drift detection (higher loads on the cloud).
- we'll want tests as well
- we probably want the controller to accept a CLI flag, to set a global resync period
- for configuring per Kind resync period (say I want my a force resync of my networks every hour, while more frequent resync of floating IPs), we'll have to think about a solution. Perhaps this solution is KRO.
- we should have a strategy for when the managed resource was updated out-of-band, but ORC does not have the ability to revert it to the expected state (the field is immutable in ORC for example)
- what happens when a managed resource is deleted in OpenStack?
- many more questions
Ideally, we'll discuss all these points in a design document.
1e62aa9 to
1db80df
Compare
This change introduces a new `resyncPeriod` field in `ManagedOptions` that
enables periodic reconciliation of resources to detect and correct drift
from the desired state in OpenStack.
Motivation:
Resources managed by ORC can drift from their desired state due to
external modifications in OpenStack. Without drift detection, these
changes go unnoticed until the next spec change triggers reconciliation.
The resyncPeriod option allows users to configure periodic checks to
detect and remediate such drift automatically.
Implementation:
- Add `resyncPeriod` field (metav1.Duration) to ManagedOptions API
- Add GetResyncPeriod() helper method for safe access with nil handling
- Modify shouldReconcile() in the generic controller to check if enough
time has passed since the last successful reconciliation
- Schedule next resync when periodic resync is enabled and no other
reschedule is pending
- Update all CRDs, OpenAPI schema, and documentation
Usage:
```yaml
spec:
managedOptions:
resyncPeriod: 1h # Reconcile every hour to detect drift
```
If not specified or set to 0, periodic resync is disabled (default
behavior unchanged).
Closes: k-orc#655
Co-Authored-By: Claude Opus 4.5 <[email protected]>
1db80df to
d38ac01
Compare
|
I'll try to reply some right now:
Added a warning in docs
We need to think how we want to test it but I defiantly agree we should figure out a way to gate it. For now as a draft tested locally on my computer.
I think in this case it just won't fix the drift but it won't trigger another update I believe as when we are updating we compare the fields that we can change
This case would behave similarly to reconciling before the resource exist and ORC will try to create it. |
dlaw4608
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice, I played around with it for a while locally, works great!!
|
|
||
| if osResource == nil { | ||
| // Programming error: if we don't have a resource we should either have an error or be waiting on something | ||
| return reconcileStatus.WithError(fmt.Errorf("oResource is not set, but no wait events or error")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noticed a small typo not from this PR, should be "osResource is not set ...." up to you @eshulman2 if you want to correct it.
Recreate resource in case it was deleted by something external to ORC. this solves the issue when deleting with external but does raise concerns I am afraid that in case of split brain or other edge case the resource will be created over and over causing a catastrophic failure
|
@mandre I added a second commit to re-create the resource on external delete. I must say I'm a bit uncomfortable with the possible risk profile it might add in edge cases, but I'll leave it here for now for reference and discussion. |
This change introduces a new
resyncPeriodfield inManagedOptionsthat enables periodic reconciliation of resources to detect and correct drift from the desired state in OpenStack.Motivation: Resources managed by ORC can drift from their desired state due to external modifications in OpenStack. Without drift detection, these changes go unnoticed until the next spec change triggers reconciliation. The resyncPeriod option allows users to configure periodic checks to detect and remediate such drift automatically.
Implementation:
resyncPeriodfield to ManagedOptions APItime has passed since the last successful reconciliation
reschedule is pending
Usage:
If not specified default sync is 10H. if set to 0, periodic resync is disabled.
Closes: #655
Co-Authored-By: Claude Opus 4.5 [email protected]