'Prometheus alerting rule not detecting first time metric increase

I have one counter metric error_in_execution. Whenever the error appears counter.inc(); called. I have the following alert expression that triggers when the counter increase.

expr: increase(error_in_execution[5m]) > 0
for: 5m

Now the issue is, when there is no metric exists and an error appear the first time, the counter value increase to 1. Which is not detected by this alert expression and it did not trigger. Then when the counter increases to 2. Alert triggered.

The following example would be easy to understand.

Time 0: 
Prometheus: error_in_execution --> No Metric Exsist. 
Alert: increase(error_in_execution[5m]) > 0 --> Not triggered 

Time 1: Error occur [error_in_execution.inc()]
Prometheus: error_in_execution --> 1
Alert: increase(error_in_execution[5m]) > 0 --> Still Not triggered <<<<<< It should be triggered. ( Please help here) 

Time 2: Error occur [error_in_execution.inc()]
Prometheus: error_in_execution --> 2
Alert: increase(error_in_execution[5m]) > 0 --> Alert triggerd.

prometheus

Solution 1:^[1]

This is a "normal" behaviour. If the metric does not exist before and is then initialized with the value 1, this is not considered in functions like increase() or rate().

To catch the very first error, you need to make sure, that the metric exists from the beginning when your application starts having the initial value 0, then the first incrementatation will trigger your alert.

Solution 2:^[2]

I think I found a workaround for this.

For counters that existed before t, increase(_metric_[t]) is equivalent to _metric_ - _metric_ offset t. (it's not, but that is a different issue).
For counters that did not exist before t, the increase is simply the metrics value _metric_ - 0 = _metric_.

We can find out whether a metric existed at point t by querying it _metric_ offset t. And we can use that as a WHERE NOT EXISTS filter using the unless operator.

Putting it together, we get following query:

( _metric_ unless _metric offset 1d ) or ( _metric_ - _metric_ offset 1d )
^-----------new counters------------^    ^--------existing counters------^

Example

One event happens each timeframe, we want to measure the increase over 2 timeframes.
Expected:
- none for each query frame before the first occurrence
- one for the query frame on first occurrence
- 2 for each query frame beyond the first occurrence

                               t0  t1  t2  t3  t4  t5
_metric_                       -   -   1   2   3   4
_metric offset 2t              -   -   -   -   1   2
__ unless __ offset 2t         -   -   1   2   -   -
__ <minus> __ offset 2t        -   -   -   -   2   2
=====================================================
() or ()                       -   -   1   2   2   2

Grafana example graph
total is the raw counter value, increase is the result of the query. It is still split in two series because the metric name is dropped on the - operation, but not on unless. But summing them up again works well, and is something you will probably do anyways.
Grafana graph with sum

It's really a shame prometheus makes it so hard for everyone who does not use it to display cpu temperature. This is one of the instances where my pride to have found a solution is only surpassed by my exasperation that it was necessary in the first place.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Jens Baitinger
Solution 2	lazySaur

'Prometheus alerting rule not detecting first time metric increase

Solution 1:[1]

Solution 2:[2]

Example

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]