'Grafana Prometheus - query processing would load too many samples into memory in query execution

I'm trying to get my query to sum over intervals in grafana but I get this error:

"query processing would load too many samples into memory in query execution"

if I look at the last 30days at a daily interval.

I have a variable called intrvl with certain time intervals like 1m, 1h, 12h, 24h, and 30d, and my query looks like this:

sort_desc(
sum by (backend)(sum_over_time(haproxy_backend_http_responses_total{code=~"[1,2,3,4][x][x]",tags=~".*external.*"}[$intrvl]))
/
sum by (backend)(sum_over_time(haproxy_backend_http_responses_total{code!~"\\b(\\w*other\\w*)\\b",tags=~".*external.*"}[$intrvl]))
)

I'm using a line chart viz and I also have Min step of the chart set to $intrvl as well. Is this the right way to calculate a percentage based on a time range?



Solution 1:[1]

too many samples error message comes from Prometheus (promql/engine.go), not Grafana. issue #4513

You can try to raise the limit with Prometheus flag --query.max-samples introduced in Prometheus v2.5.0. (see default for your version in prometheus -h output).

Solution 2:[2]

Since you are using a considered amount of data to calculate your formula, I would consider creating a prometheus recording rule, that will pre compute the values needed and sum_over_intervalusing the created rule.

Solution 3:[3]

Problem: performing massive query in prometheus throws an error in Grafana dashboard and fail the result:

query processing would load too many samples into memory in query execution

Solution: Use --query.max-samples in prometheus configuration file and increase the number of load in memory. default value is 50000000 , increase this value depends on your machine capabilities. From documentation:

--query.max-samples=50000000
     Maximum number of samples a single query can load into memory. 
     Note that queries will fail if they try to load more samples than this into memory,
     so this also limits the number of samples a query can return.

Example: Assuming you run your prometheus service in docker-compose execution on docker-compose.yml:

version: '3.2'

services:   
prometheus:
    image: prom/prometheus:latest
    expose:
      - 9090
    ports:
      - 9090:9090
    volumes:
      - ./prometheus/:/etc/prometheus/
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--query.max-samples=100000000'
      - '--web.external-url=http://prom.some-company-url.com:9090'

Solution 4:[4]

The haproxy_backend_http_responses_total metric is a counter, so it is likely increase() function must be used instead of sum_over_time():

sort_desc(
sum by (backend)(increase(haproxy_backend_http_responses_total{code=~"[1234]..",tags=~".*external.*"}[$intrvl]))
/
sum by (backend)(increase(haproxy_backend_http_responses_total{tags=~".*external.*"}[$intrvl]))
)
  • The sum_over_time() function calculates the sum of all the raw samples on the given lookbehind window in square brackets. This function is intended for gauges.
  • The increase() function calculates the increase of a counter over the given lookbehind window.

The too many samples error occurs because Prometheus loads into memory all the raw samples for all the time series matching the given series selector on the given lookbehind window specified in square brackets. It is likely the haproxy_backend_http_responses_total{tags=~".*external.*"} selector matches big number of time series. The following query can be used for estimating the number of time series the query needs to load into memory:

count(
  last_over_time(
    haproxy_backend_http_responses_total{tags=~".*external.*"}[$intrvl]
  )
)

The following query can be used for estimating the number of raw sample the query needs to load into memory:

sum(
  count_over_time(
    haproxy_backend_http_responses_total{tags=~".*external.*"}[$intrvl]
  )
)

As you can see, the number of matching time series and the number of raw samples, which Prometheus needs to load into memory, grows with the lookbehind window in square brackets - [$intrvl] in queries above.

This article may be useful for understanding how to determine the root cause for heavy PromQL queries and how to optimize them.

The too many samples error may be fixed by passing bigger value to --query.max-samples command-line flag as outlined in this answer. Note that this may increase memory usage when Prometheus processes heavy queries.

An alternative solution for fixing the too many samples error is to use other Prometheus-like systems, which may need lower amounts of memory when processing heavy queries. Try, for example, VictoriaMetrics.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Yuri Lachin
Solution 2 Sergio Santiago
Solution 3 avivamg
Solution 4 valyala