'How to calculate containers' cpu usage in kubernetes with prometheus as monitoring?
I want to calculate the cpu usage of all pods in a kubernetes cluster. I found two metrics in prometheus may be useful:
container_cpu_usage_seconds_total: Cumulative cpu time consumed per cpu in seconds.
process_cpu_seconds_total: Total user and system CPU time spent in seconds.
Cpu Usage of all pods = increment per second of sum(container_cpu_usage_seconds_total{id="/"})/increment per second of sum(process_cpu_seconds_total)
However, I found every second's increment of container_cpu_usage{id="/"}
larger than the increment of sum(process_cpu_seconds_total)
. So the usage may be larger than 1...
Solution 1:[1]
This I'm using to get CPU usage at cluster level:
sum (rate (container_cpu_usage_seconds_total{id="/"}[1m])) / sum (machine_cpu_cores) * 100
I also track the CPU usage for each pod.
sum (rate (container_cpu_usage_seconds_total{image!=""}[1m])) by (pod_name)
I have a complete kubernetes-prometheus solution on GitHub, maybe can help you with more metrics: https://github.com/camilb/prometheus-kubernetes
Solution 2:[2]
I created my own prometheus exporter (https://github.com/google-cloud-tools/kube-eagle), primarily to get a better overview of my resource utilization on a per node basis. But it also offers a more intuitive way monitoring your CPU and RAM resources. The query to get the cluster wide CPU usage would look like this:
sum(eagle_pod_container_resource_usage_cpu_cores)
But you can also easily get the CPU usage by namespace, node or nodepool.
Solution 3:[3]
The following query returns per-container average number of CPUs used during the last 5 minutes:
rate(container_cpu_usage_seconds_total{container!~"POD|"}[5m])
The lookbehind window in square brackets (5m
in the case above) can be changed to the needed value. See possible time duration values here.
The container!~"POD|"
filter removes metrics related to cgroups hierarchy (see this answer for more details) and metrics for e.g. pause
containers (see these docs).
Since each pod
can contain multiple containers, then the following query can be used for returning per-pod average number of CPUs used during the last 5 minutes:
sum(
rate(container_cpu_usage_seconds_total{container!~"POD|"}[5m])
) by (namespace,pod)
Solution 4:[4]
Well you can use below query as well:
avg (rate (container_cpu_usage_seconds_total{id="/"}[1m]))
Solution 5:[5]
I prefer to use this metric per doc
sum(rate(container_cpu_usage_seconds_total{name!~".*prometheus.*", image!="", container_name!="POD"}[5m])) by (pod_name, container_name) /
sum(container_spec_cpu_quota{name!~".*prometheus.*", image!="", container_name!="POD"}/container_spec_cpu_period{name!~".*prometheus.*", image!="", container_name!="POD"}) by (pod_name, container_name)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | kentor |
Solution 3 | |
Solution 4 | slm |
Solution 5 | zangw |