'HPA Scaling even though Current CPU is below Target CPU
I am playing around with the Horizontal Pod Autoscaler in Kubernetes. I've set the HPA to start up new instances once the average CPU Utilization passes 35%. However this does not seem to work as expected.
The HPA triggers a rescale even though the CPU Utilization is far below the defined target utilization. As seen below the "current" utilization is 10% which is far away from 35%. But still, it rescaled the number of pods from 5 to 6.
I've also checked the metrics in my Google Cloud Platform dashboard (the place at which we host the application). This also shows me that the requested CPU utilization hasn't surpassed the threshold of 35%. But still, several rescales occurred.
The content of my HPA
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: django
spec:
{{ if eq .Values.env "prod" }}
minReplicas: 5
maxReplicas: 35
{{ else if eq .Values.env "staging" }}
minReplicas: 1
maxReplicas: 3
{{ end }}
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: django-app
targetCPUUtilizationPercentage: 35
Does anyone know what the cause of this might be?
Solution 1:[1]
This is tricky and can be a bug, but I don't think so, most of time people configure too low values as I'll explain.
How targetCPUUtilizationPercentage
relates to Pod's request limits.
The targetCPUUtilizationPercentage
configures a percentage based on all the CPU a pod can use. On Kubernetes we can't create an HPA
without specifying some limits
to CPU usage.
Let's assume that this is our limits:
apiVersion: v1
kind: Pod
metadata:
name: apache
spec:
containers:
- name: apache
image: httpd:alpine
resources:
limits:
cpu: 1000m
And in our targetCPUUtilizationPercentage
inside HPA we specify 75%.
That is easy to explain because we ask for 100% (1000m = 1 CPU core) of a single core, so when this core is about 75% of use, HPA will start to work.
But if we define our limits as this:
spec:
containers:
- name: apache
image: httpd:alpine
resources:
limits:
cpu: 500m
Now, 100% of CPU our pod can utilize is only 50% of a single core. Fine, so 100% of cpu usage from this pod means, on hardware, 50% usage of a single core.
This is indifferent for targetCPUUtilizationPercentage
, if we keep our value of 75%
the HPA will start to work when our single core is about 37.5%
usage, because this is 75% of all CPU this pod
can consume.
From the perspective of a pod/hpa, they never know that they are limited on CPU or memory.
Understanding the scenario in the question above
With some programs like the one used in the question above - the CPU spikes do occur - however only in small timeframes (for example 10 second spikes). Due to the short duration of these spikes the metric server doesn't save this spike, but only saves the metric after a 1m window. In such cases the spike in between such windows will be excluded. This explains why the spike cannot be seen in the metrics dashboards, but is picked up by the HPA.
Thus, for services with low cpu limits a larger scale-up time window (scaleUp
settings in HPA) can be ideal.
Solution 2:[2]
Scaling is based on % of requests
not limits
. I think we should change this answer as the examples in the accepted answer show:
limits:
cpu: 1000m
But the targetCPUUtilizationPercentage
is based on requests
like:
requests:
cpu: 1000m
For per-pod resource metrics (like CPU), the controller fetches the metrics from the resource metrics API for each Pod targeted by the HorizontalPodAutoscaler. Then, if a target utilization value is set, the controller calculates the utilization value as a percentage of the equivalent resource request on the containers in each Pod. If a target raw value is set, the raw metric values are used directly. The controller then takes the mean of the utilization or the raw value (depending on the type of target specified) across all targeted Pods, and produces a ratio used to scale the number of desired replicas.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Shabirmean |
Solution 2 | Drew |