'Calculate Max in value with prometheus

Since I am prometheus-newbie I do not know how to express the question: "What is the maximum number of messages which have been processed per second during the last day". The metric is named messages_in_total

I tried

  • max_over_time(messages_in_total{}[1d]) - but this returns the maximum of the counter value
  • icrease(messages_in_total{}[1d])- but this returns the number the counter increased

What I really need would be something like (pseudocode)

1.) Transform range vector which contains absolute messages_in_total to a range vector with which has a value for each second.

2.) get the max out out of it

Example:

  • initial range vector values = (3000,4000, 7000, 8009)
  • adjusted range vector values with rate for each second (values are guessed) = (40, 70, 40)
  • max_value => 70 messages processed per second

Any ideas?



Solution 1:[1]

It is possible.

Example query:

max_over_time(
   irate( messages_in_total[2m] )[1d:1m]
)

This will:

  1. take last 1 day
  2. For every 1 minute in that 1 day range it will execute irate( messages_in_total[2m] )
  3. Combine that into range vector
  4. Call max_over_time on all results

See subquery documentation for more information!

Solution 2:[2]

While the answer returns the maximum per-second rate over the last 24 hours for messages_in_total metric, it has the following potential issues:

  • It may skip a part of raw samples if the interval between them (aka scrape_interval) is smaller than one minute. This can be fixed by reducing the step value in square brackets after the colon, so it doesn't exceed the scrape_interval.
  • It may return an empty result or incomplete result if the scrape interval exceeds 2m (e.g. 2 minutes). This can be fixed by increasing the lookbehind window in the inner square brackets from 2m to the value exceeding 2x scrape_interval.
  • It may become very slow and resource hungry because of subquery overhead.
  • Subqueries are easy to mis-use, so they would silently return unexpected results.

While Prometheus doesn't provide the reliable and easy to use solution for these issues, other Prometheus-like systems may have the solution. For example, the following MetricsQL query returns the maximum, the minimum and the average per-second increase rates for messages_in_total time series for the last 24 hours:

rollup_rate(messages_in_total[1d])

It uses rollup_rate function. If you need only the maximum per-second rate, then the query can be wrapped into label_match function, which leaves only time series with rollup="max" label:

label_match(
  rollup_rate(messages_in_total[1d]),
  "rollup", "max"
)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 valyala