'count k8s cluster cpu/memory usage with prometheus

I want to count k8s cluster cpu/memory usage (not k8s pod usage) with prometheus, so that i can show in grafana.

I use sum (container_memory_usage_bytes{id="/"}) to get k8s cluster used memory, and topk(1, sum(kube_node_status_capacity_memory_bytes) by (instance)) to get whole k8s cluster memory, but they can not divide since topk function does not return value but vector.

How can i do this?



Solution 1:[1]

My main qustion is that topk(1, sum(kube_node_status_capacity_memory_bytes) by (instance)) can not return a value, but now i find that use sum() to covert it can work, whole query as following:

sum(sum (container_memory_usage_bytes{id="/"})by (instance))/sum(topk(1, sum(kube_node_status_capacity_memory_bytes) by (instance)))*100

Solution 2:[2]

I have installed Prometheus on google Cloud through the gcloud default applications. The dashboards automatically got deployed with the installation. The following queries are what was used for memory and CPU usage of the cluster:

CPU usage by namespace:

sum(irate(container_cpu_usage_seconds_total[1m])) by (namespace)

Memory usage (no cache) by namespace:

sum(container_memory_rss) by (namespace)

CPU request commitment:

sum(kube_pod_container_resource_requests_cpu_cores) / sum(node:node_num_cpu:sum)

Memory request commitment:

sum(kube_pod_container_resource_requests_memory_bytes) / sum(node_memory_MemTotal)

Solution 3:[3]

The following query returns global memory usage for all the running pods in K8S:

sum(container_memory_usage_bytes{container!=""})

This query uses sum() aggregate function for summing memory usage across all the containers, which run in K8S.

The container!="" filter is needed for filtering out redundant metrics related to cgroups hierarchy. See this answer for details.

The following query returns global memory usage for k8s cluster in percentage:

100 * (
  sum(container_memory_usage_bytes{container!=""})
    /
  sum(kube_node_status_capacity{resource="memory"})
)

Note that some nodes in K8S can have much higher memory usage in percentage than the other nodes because of scheduling policies. The following query allows determining top 3 nodes with the maximum memory usage in percentage:

topk(3,
  100 * (
    sum(container_memory_usage_bytes{container!=""}) by (node)
      / on(node)
    kube_node_status_capacity{resource="memory"}
  )
)

This query uses topk function for limiting the number of returned time series to 3. Note that the query may return more than 3 time series on a graph in Grafana, since topk returns up to k unique time series per each point on the graph. If you need a graph with no more than k time series with the maximum values, then take a look at topk_* functions at MetricsQL such as topk_max, topk_avg or topk_last.

The query also uses on() modifier for / operation. This modifier limits the set of labels, which is used for finding time series pairs on the left and the right side of / with identical label values. Then Prometheus applies the / operation individually per each such pair. See these docs for details.

The following query returns the number of CPU cores used by all the pods in Kubernetes:

sum(rate(container_cpu_usage_seconds_total{container!=""}[5m]))

The following query returns global CPU usage for k8s cluster in percentage:

100 * (
  sum(rate(container_cpu_usage_seconds_total{container!=""}[5m]))
    /
  sum(kube_node_status_capacity{resource="cpu"})
)

Some nodes may be loaded much more than the rest of nodes in Kubernetes cluster. The following query returns top3 nodes with the highest CPU load:

topk(3,
  100 * (
    sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (node)
      / on(node)
    kube_node_status_capacity{resource="cpu"})
)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 daixiang0
Solution 2 cookiedough
Solution 3 valyala