-
Notifications
You must be signed in to change notification settings - Fork 61
Freeze scaling when unhealthy nodes found and remove them #271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Freeze scaling when unhealthy nodes found and remove them #271
Conversation
mwhittington21
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic looks good, just a lot of minor changes I'd like to see. Nice work
|
|
||
| This is an optional feature and by default is disabled. | ||
|
|
||
| ### `unhealthy_node_grace_period` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: list the default values for all of these for quick reference. As I believe these default to being turned off, list some good starting points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
health_check_newest_nodes_percent is the only one that is technically required if unhealthy_node_grace_period is set. I don't see the point to setting a default value for unhealthy_node_grace_period and health_check_newest_nodes_percent because those are required to use the feature. Setting a default for max_unhealthy_nodes_percentage makes sense because that one can still be used when not set
pkg/controller/node_group_test.go
Outdated
| TaintEffect: "invalid", | ||
| MaxNodeAge: "bla", | ||
| UnhealthyNodeGracePeriod: "bla", | ||
| MaxUnhealthyNodesPercent: 101, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: max being 101 would imply that the value can be set to 101%. Is that true? If you name the variable max I'd prefer a <= comparison rather than < 101. It makes more logical sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is how it is set in node_group.go. This is the test here to make sure that 101 is higher. I don't follow the concern here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it was a test value rather than the real comparison value. You wanted it to be invalid - I missed that.
|
|
||
| The maximum percentage of unhealthy nodes in the test set from `health_check_newest_nodes_percent`. Beyond this threshold all scaling activity is paused and unhealthy nodes are flushed out. | ||
|
|
||
| This is an optional field. If not set, it will default to `0%`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, if set to 0% this means any unhealthy node will pause scaling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
pkg/controller/scale_up.go
Outdated
| if err != nil { | ||
| log.Errorf("Failed to add nodes because of an error. Skipping cloud provider node group scaleup: %v", err) | ||
| return 0, err | ||
| if opts.nodesDelta > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this if statement is redundant, it is the same as the outer if statement.
pkg/controller/scale_up.go
Outdated
| } | ||
| if opts.nodesDelta > 0 { | ||
| // check that untainting the nodes doesn't do bring us over max nodes | ||
| if opts.nodesDelta <= 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: this can never be hit, is this the correct logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted it back to the original implementation, but yeah that can never be hit. I'll refactor to an implementation which is equivalent
mwhittington21
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just the minor nit around unreachable if logic that should be looked at
mwhittington21
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
mwhittington21
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Uh oh!
There was an error while loading. Please reload this page.