'scrape interval and evaluation interval in prometheus
My scrape interval and evaluation interval are way off from each other as whown below (15s vs 4m). When I feed metrics to the endpoint, I find that the rules are evaluated every 4m which is expected. However, what I dont understand is that it does not evaluate rules on all the metrics fed for the last 4 minutes. I am having a hard time understanding on how the two clocks (scrape and evaluation) function. Also, the documentation around this is very sparse. Any pointers will be of great help. I have no hesitation in changing the scrape time and evaluation time to say 15 seconds each. But i need to understand the ramifications of setting the clocks apart.
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 4m # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- testmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "/etc/prometheus/xyz_rule.yml"
- "/etc/prometheus/pqr_rule.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
metrics_path: /v1/metrics/xyz
# scheme defaults to 'http'.
static_configs:
- targets: ['test:7070']
Solution 1:[1]
The two processes are independent, PromQL and recording rules both have no knowledge of what your scrape interval is. So whatever rule you specify will evaluate in the same way with the same result when evaluated at a given time, no matter what the evaluation interval is.
For simplicity and sanity it's best to have the two intervals the same, so I'd suggest having both as 15s here.
Solution 2:[2]
Scrape interval : It defines the interval based on which prometheus scrapes a monitored target. It is defined globally but can also be overridden at job level. Defaults to 1 min.
Evaluation interval : It defines the interval based on which prometheus evaluates the query for alerting. In each evaluation cycle, prometheus runs the expression defined in each alerting rule and sets the state of alert.
Recommendation : Set both intervals to the same
value
The time taken to fire an alarm shall vary between :
minimum time = [time set in the 'FOR' clause]
maximum time = [scrapeInterval + evaluationInterval + 'FOR' clause time]
Assumption : evaluationInterval
is a multiple
of scrapeInterval
.
For eg,
scrapeInterval = 30 sec and
evaluation interval = 1 min and
FOR clause set for 2 mins
Minimum time to fire alarm => 2 mins.
Maximum time to fire alarm => 30 sec + 1min + 2min => 3 min 30 sec
NOTE : If FOR clause time is 0, alert will enter the firing state immediately.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | brian-brazil |
Solution 2 |