'Calculate the duration in which a Prometheus metric had a certain value?
Is it possible with Prometheus to calculate a duration (for example in seconds) in which a metric had a certain value?
A simple example would be an up
metric which can have two values: 1
or 0
to indicate if a system is running. Imagine that since last week the system was going up and down several times.
I'd like to be able to calculate the total number of seconds the system was down during that period of time.
Solution 1:[1]
Here's the solution. To find the downtime (in seconds) over the last day:
(1 - avg_over_time(up[1d])) * 60 * 60 * 24
And here's how to use that query in Grafana to calculate the downtime depending on a selected time range:
(1 - avg_over_time(up[$__range])) * $__range_s
Solution 2:[2]
The solution provided in this answer works only for up
-like metrics, which can have either 0 or 1 values. If the metric can have other values, then the solution doesn't work :( In this case it is possible to use subqueries. For example, the following query returns an approximate duration in seconds when the metric temperature
had values greater than 20 during the last day:
avg_over_time((temperature >bool 20)[1d:1m]) * 24 * 3600
This solution uses bool
modifier for >
operation - see these docs for details.
P.S. VictoriaMetrics provides share_gt_over_time function, which simplifies the query above to the following MetricsQL query:
share_gt_over_time(temperature[1d], 20) * 1d
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | grdl |
Solution 2 | valyala |