'HPA Scaling even though Current CPU is below Target CPU

I am playing around with the Horizontal Pod Autoscaler in Kubernetes. I've set the HPA to start up new instances once the average CPU Utilization passes 35%. However this does not seem to work as expected. The HPA triggers a rescale even though the CPU Utilization is far below the defined target utilization. As seen below the "current" utilization is 10% which is far away from 35%. But still, it rescaled the number of pods from 5 to 6.

I've also checked the metrics in my Google Cloud Platform dashboard (the place at which we host the application). This also shows me that the requested CPU utilization hasn't surpassed the threshold of 35%. But still, several rescales occurred.

The content of my HPA

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
 name: django
spec:
{{ if eq .Values.env "prod" }}
 minReplicas: 5
 maxReplicas: 35
{{ else if eq .Values.env "staging" }}
 minReplicas: 1
 maxReplicas: 3
{{ end }}
 scaleTargetRef:
   apiVersion: apps/v1
   kind: Deployment
   name: django-app
 targetCPUUtilizationPercentage: 35

Does anyone know what the cause of this might be?

Solution 1:^[1]

This is tricky and can be a bug, but I don't think so, most of time people configure too low values as I'll explain.

How `targetCPUUtilizationPercentage` relates to Pod's request limits.

The targetCPUUtilizationPercentage configures a percentage based on all the CPU a pod can use. On Kubernetes we can't create an HPA without specifying some limits to CPU usage.

Let's assume that this is our limits:

apiVersion: v1
kind: Pod
metadata:
  name: apache
spec:
  containers:
    - name: apache
      image: httpd:alpine
      resources:
        limits:
          cpu: 1000m

And in our targetCPUUtilizationPercentage inside HPA we specify 75%.

That is easy to explain because we ask for 100% (1000m = 1 CPU core) of a single core, so when this core is about 75% of use, HPA will start to work.

But if we define our limits as this:

spec:
  containers:
    - name: apache
      image: httpd:alpine
      resources:
        limits:
          cpu: 500m

Now, 100% of CPU our pod can utilize is only 50% of a single core. Fine, so 100% of cpu usage from this pod means, on hardware, 50% usage of a single core.

This is indifferent for targetCPUUtilizationPercentage, if we keep our value of 75% the HPA will start to work when our single core is about 37.5% usage, because this is 75% of all CPU this pod can consume.

From the perspective of a pod/hpa, they never know that they are limited on CPU or memory.

Understanding the scenario in the question above

With some programs like the one used in the question above - the CPU spikes do occur - however only in small timeframes (for example 10 second spikes). Due to the short duration of these spikes the metric server doesn't save this spike, but only saves the metric after a 1m window. In such cases the spike in between such windows will be excluded. This explains why the spike cannot be seen in the metrics dashboards, but is picked up by the HPA.

Thus, for services with low cpu limits a larger scale-up time window (scaleUp settings in HPA) can be ideal.

Solution 2:^[2]

Scaling is based on % of requests not limits. I think we should change this answer as the examples in the accepted answer show:

 limits:
   cpu: 1000m

But the targetCPUUtilizationPercentage is based on requests like:

requests:
   cpu: 1000m

For per-pod resource metrics (like CPU), the controller fetches the metrics from the resource metrics API for each Pod targeted by the HorizontalPodAutoscaler. Then, if a target utilization value is set, the controller calculates the utilization value as a percentage of the equivalent resource request on the containers in each Pod. If a target raw value is set, the raw metric values are used directly. The controller then takes the mean of the utilization or the raw value (depending on the type of target specified) across all targeted Pods, and produces a ratio used to scale the number of desired replicas.

https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-a-horizontalpodautoscaler-work

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Shabirmean
Solution 2	Drew

'HPA Scaling even though Current CPU is below Target CPU

Solution 1:[1]

How targetCPUUtilizationPercentage relates to Pod's request limits.

Understanding the scenario in the question above

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

How `targetCPUUtilizationPercentage` relates to Pod's request limits.

Solution 2:^[2]