Kubernetes (k8s) deployments already have a max surge concept, and there's no reason this surge should only apply to new rollouts and not to node maintenance or other situations where PodDisruptionBudget (PDB)-protected pods need to be evicted. This project uses node cordons to signal eviction-autoscaler Custom Resources that correspond to a PodDisruptionBudget and target a deployment. An eviction autoscaler controller then attempts to scale up a the targeted deployment (or scaleset if you're feeling brave) when the pdb's allowed disruptions is zero and scales down once evictions have stopped.
Overprovisioning isn't free. Sometimes it makes sense to run as cost-effectively as possible, but you still don't want to experience downtime due to a cluster upgrade or even a VM maintenance event.
Your app might also experience issues for unrelated reasons, and a maintenance event shouldn't result in downtime if adding extra replicas can save you.
- Node Controller: Signals eviction-autoscaler for all pods on cordoned nodes selected by corresponding pdb whose name/namespace it shares.
- Eviction-autoscaler Controller: Watches eviction-autoscale resources. If there a recent eviction singals and the PDB's AllowedDisruotions is zero, it triggers a surge in the corresponding deployment. Once evitions have stopped for some cooldown period and allowed diruptions has rised above zero it scales down.
- PDB Controller (Optional): Automatically creates eviction-autoscalers Custom Resources for existing PDBs.
- Deployment Controller (Optional): Creates PDBs for deployments that don't already have them and keeps min available matching the deployments replicas (not counting any surged in by eviction autoscaler)
graph TD;
Cordon[Cordon]
NodeController[Cordoned Node Controller]
CRD[Eviction Autoscaler Custom Resource]
Controller[Eviction-Autoscaler Controller]
Deployment[Deployment or StatefulSet]
PDB[Pod Disruption Budget]
PDBController[Optional PDB creator]
Cordon -->|Triggers| NodeController
NodeController -->|writes spec| CRD
CRD -->|spec watched by| Controller
Controller -->|surges and shrinks| Deployment
Controller -->|Writes status| CRD
Controller -->|reads allowed disruptions | PDB
PDBController -->|watches | Deployment
PDBController -->|creates if not exist| PDB
- Docker
- kind for e2e tests.
- A sense of adventure
You can install Eviction-Autoscaler using the Azure Kubernetes Extension Resource Provider (RP) or via Helm.
-
Add the eviction-autoscaler Helm repository:
helm repo add eviction-autoscaler https://azure.github.io/eviction-autoscaler/charts helm repo update
-
Install the chart into your cluster:
helm install eviction-autoscaler eviction-autoscaler/eviction-autoscaler \ --namespace eviction-autoscaler --create-namespace \ --set pdb.create=true
Note: Setting
pdb.create=truewill automatically create a PodDisruptionBudget (PDB) for deployments that do not already have one, ensuring your workloads are protected and enabling eviction-autoscaler to manage disruptions effectively.If a deployment already has a PDB whose label selector matches the deployment's pod template labels, eviction-autoscaler will not create a new PDB—even if
pdb.create=true. This avoids duplicate PDBs and ensures existing disruption budgets are respected.For example, if you deploy an app without a PDB:
apiVersion: apps/v1 kind: Deployment metadata: name: my-app namespace: default spec: replicas: 2 selector: matchLabels: app: my-app template: metadata: labels: app: my-app spec: containers: - name: my-app image: nginxWith
pdb.create=true, eviction-autoscaler will automatically create a matching PDB:apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: my-app namespace: default spec: minAvailable: 2 selector: matchLabels: app: my-appIf a matching PDB already exists, eviction-autoscaler will not create another. If you later disable
pdb.create, eviction-autoscaler will not delete any existing PDBs—it will simply stop creating new ones.
- (Optional) Customize values by passing
--values my-values.yamlor using--set key=value.
Refer to the Helm Values for configuration options.
Follow the steps below to register the required features and deploy the extension to your AKS cluster.
az feature register --namespace Microsoft.KubernetesConfiguration --name ExtensionsWait until the feature state is Registered:
az feature show --namespace Microsoft.KubernetesConfiguration --name Extensionsaz provider register -n Microsoft.KubernetesConfigurationaz aks create \
--resource-group <your-resource-group> \
--name <your-aks-cluster-name> \
--node-count 2 \
--generate-ssh-keysaz k8s-extension create \
--cluster-name <your-cluster-name> \
--cluster-type managedClusters \
--extension-type microsoft.evictionautoscaler \
--name <your-extension-name> \
--resource-group <your-resource-group-name> \
--release-train dev \
--config AgentTimeoutInMinutes=30 \
--subscription <your-subscription-id> \
--version 0.1.2 \
--auto-upgrade-minor-version falseNote: The
--configuration-settings pdb.create=trueoption enables automatic creation of PodDisruptionBudgets (PDBs) for deployments that do not already have one. ensuring your workloads are protected and enabling eviction-autoscaler to manage disruptions effectively. Eviction-autoscaler determines whether a deployment already has a corresponding PDB by comparing the PDB’s label selector with the deployment’s pod template labels. This ensures that each deployment is protected from disruptions and avoids duplicate PDBs. If you later disablepdb.create, eviction-autoscaler will not delete any existing PDBs—it will simply stop creating new ones. Note: The--auto-upgrade-minor-version falseoption is only required if you want to disable automatic minor version upgrades. Note: The--release-train devoption specifies that the extension will use the "dev" release train, which typically includes the latest development builds and experimental features.
Other available release train options includestable(recommended for production workloads) andpreview(for pre-release features).
Usedevfor testing or development environments,previewfor evaluating upcoming features, andstablefor production deployments.
Refer to the extension documentation for configuration options.
Configuration options will be documented here in future updates. If you have suggestions, please open an issue or PR.
If you want to exclude a specific deployment from automatic PodDisruptionBudget (PDB) creation, add the following annotation to its manifest:
metadata:
annotations:
eviction-autoscaler.azure.com/pdb-create: "false"This annotation instructs eviction-autoscaler not to create a PDB for that deployment, regardless of whether you installed via Helm or the Azure Kubernetes Extension Resource Provider.
Eviction-autoscaler automatically skips PDB creation for deployments that have a maxUnavailable value other than 0 in their rolling update strategy. This is because such deployments already tolerate some level of downtime during updates or maintenance.
For example, the following deployment will not get an automatic PDB:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 25% # This doesn't affect PDB creation
maxUnavailable: 1 # Allows 1 pod to be unavailable - skips PDB creation
# ... rest of specIn this case, since maxUnavailable: 1, the deployment is explicitly designed to tolerate one pod being down. Creating a PDB would conflict with this configuration. Note that maxSurge does not affect PDB creation - only maxUnavailable matters.
If you want a PDB for such a deployment, you can either:
- Set
maxUnavailable: 0in the deployment strategy, or - Manually create and manage the PDB yourself
This behavior applies to both integer values (maxUnavailable: 1) and percentage values (maxUnavailable: 25%). Only deployments with maxUnavailable: 0 or maxUnavailable: 0% will automatically get PDBs created.
When eviction-autoscaler creates a PodDisruptionBudget (PDB) for a deployment, it manages the PDB's lifecycle using both Kubernetes owner references and annotations:
- Owner Reference: Links the PDB to its deployment, ensuring the PDB is deleted when the deployment is deleted
- Annotation:
ownedBy: EvictionAutoScalermarks the PDB as managed by eviction-autoscaler
If you want to take manual control of a PDB that was created by eviction-autoscaler, remove the ownedBy annotation:
kubectl annotate pdb <pdb-name> -n <namespace> ownedBy-When the annotation is removed, eviction-autoscaler will:
- Detect the annotation removal (which triggers reconciliation)
- Remove the owner reference from the PDB
- Stop managing the PDB
After this, the PDB becomes user-managed and will not be deleted when the deployment is deleted. You take full responsibility for managing and cleaning up the PDB.
Example workflow:
# Check the current PDB annotations
kubectl get pdb my-app -n default -o jsonpath='{.metadata.annotations}'
# Remove the ownedBy annotation to take control
kubectl annotate pdb my-app -n default ownedBy-
# The PDB is now yours to manage
# Deleting the deployment will no longer delete the PDB
kubectl delete deployment my-app -n default
# You must manually delete the PDB when you're done with it
kubectl delete pdb my-app -n defaultRe-establishing controller ownership:
If you want eviction-autoscaler to take control back of a PDB, simply add the annotation back:
# Add the annotation back to return control to eviction-autoscaler
kubectl annotate pdb my-app -n default ownedBy=EvictionAutoScaler
# The controller will re-establish the owner reference on the next reconciliation
# The PDB will now be deleted when the deployment is deletedkubectl create ns laboratory
kubectl create deployment -n laboratory piggie --image nginx
# unless disabled there will now be a pdb and a pdbwatcher that map to the deployment
# show a starting state
kubectl get pods -n laboratory
kubectl get poddisruptionbudget piggie -n laboratory -o yaml # should be allowed disruptions 0
kubectl get evictionautoscalers piggie -n laboratory -o yaml
# cordon
NODE=$(kubectl get pods -n laboratory -l app=piggie -o=jsonpath='{.items[*].spec.nodeName}')
kubectl cordon $NODE
# show we've scaled up
kubectl get pods -n laboratory
kubectl get poddisruptionbudget piggie -n laboratory -o yaml # should be allowed disruptions 1
kubectl get evictionautoscalers piggie -n laboratory -o yaml
# actually kick the node off now that pdb isn't at zero.
kubectl drain $NODE --delete-emptydir-data --ignore-daemonsets
Here's a drain of Node on a to node cluster that is running the aks store demo (4 deployments and two stateful sets). You can see the drains being rejected then going through on the left and new pods being surged in on the right.
This project originated as an intern project and is still available at github.com/Javier090/k8s-pdb-autoscaler.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
