'Prometheus and Node Exporter architecture

I have been 3 days reading about this, even configuring a set of containers to test them, but I have doubts.

I understand that the architecture of Prometheus + Node exporter is based on:

  • Node exporter knows how to extract metrics. Those are exposed in HTTP, eg. :9201/metrics
  • Prometheus queries every X seconds those HTTP endpoints (node-exporter HTTTP) and stores the metrics. It also provide another HTTP for graph/console visualization/querying.

Question 1:

Assume you want CPU metrics every 15s, HDD metrics every 5m, Network every 1m, process every 30s.

Since it is prometheus who decides the scraping interval, how can be configured to just scrape those values?

Question 2:

Assume you want 1 prometheus instance and 3 node exporters, different public servers. I don't see anything regarding the node exporter and its security. The HTTP endpoint is public.

How can I securely query the metrics from my 3 servers?

Question 3:

I don't know if I am missing something. But, for example, comparing this to Telegraf, the latter sends the metrics to a database. Therefore, Telegraf acts as "node-exporter". I only need to secure the database connection (only exposed port).

Can node-exporter be configured to send a set of metrics every X time to the prometheus server? (so I don't have to expose a public port in every public server, just the prometheus server) I understand "pushgateway" is for that? How to change the node-exporter behavior?

Do you recommend me any other architecture that could suite my needs? (1 master, many slaves to query metrics)



Solution 1:[1]

Question 1

Since it is prometheus who decides the scraping interval, how can be configured to just scrape those values?

You can have different job configured each with its own scrape_interval and HTTP URL parameters params. Then, it depends on the features proposed by the exporter.

In the case of node_exporter, you can pass a list of collectors:

  • cpu every 15s (job: node_cpu)
  • process every 30s (job: node_process)
  • (well you get the idea) ...

Note that a scrape interval of 5min is likely to be too big because of data staleness: you run the risk of not getting any data in an instant vector on this data. A scrape interval of 1min is already big and has no impact on performance.

Question 2

How can I securely query the metrics from my 3 servers?

The original assumption of Prometheus is that you would use a private network. In the case of public network, you'll need some kind of proxy.

Personally, I have used exporter_exporter on a classical architecture.

Question 3

Can node-exporter be configured to send a set of metrics every X time to the prometheus server? (so I don't have to expose a public port in every public server, just the prometheus server) I understand "pushgateway" is for that? How to change the node-exporter behavior?

No, Prometheus is pull based architecture: you will need an URI accessible by Prometheus on each service you want to monitor.I imagine you could reuse components from another monitoring solution and use an adhoc exporter like the collectd exporter.

The push gateway is intended for short lived jobs that cannot wait to be scraped by Prometheus. This is a specific use case and general consensus is not to abuse it.

Solution 2:[2]

Since it is prometheus who decides the scraping interval, how can be configured to just scrape those values?

I don't believe it can be. Prometheus scrapes everything from one endpoint in one go, so if all the data comes from node_exporter, you get it all at the same frequency.

How can I securely query the metrics from my 3 servers?

The Prometheus security doc talks about using a reverse proxy for this kind of thing.

Can node-exporter be configured to send a set of metrics every X time to the prometheus server?

I don't believe so. Prometheus is a pull-type monitoring system. If you really need to move data by push, then what you'd probably have to do is have scripts or whatever push data to what amounts to a cache on the Prometheus server, then have Prometheus poll that cache on a regular basis. I don't know if such a thing exists.

Solution 3:[3]

Please take a look to Fluent Bit - https://docs.fluentbit.io (Ex. INPUT node_exporter)

Create monitoring containers for your needs using different scrape and flushing interval.

Solution 4:[4]

Assume you want CPU metrics every 15s, HDD metrics every 5m, Network every 1m, process every 30s. Since it is prometheus who decides the scraping interval, how can be configured to just scrape those values?

While it is possible to configure individual scrape_interval per each job (aka scrape_config) in Prometheus, this isn't recommended practice - see this article for more information. See also staleness docs for understanding on how Prometheus handles gaps between raw samples and how it combines multiple measurements in a single query.

Assume you want 1 prometheus instance and 3 node exporters, different public servers. I don't see anything regarding the node exporter and its security. The HTTP endpoint is public. How can I securely query the metrics from my 3 servers?

Prometheus provides the ability to scrape targets via https endpoints protected with various authorization schemes such as basic auth, bearer auth or oauth2 - see the corresponding config options at scrape_config.

If you need to scrape targets located in isolated networks / hosts, then vmagent can run in every isolated network / host, scrape metrics from local targets and send them to a centralized Prometheus-like remote storage such as VictoriaMetrics.

I don't know if I am missing something. But, for example, comparing this to Telegraf, the latter sends the metrics to a database. Therefore, Telegraf acts as "node-exporter". I only need to secure the database connection (only exposed port). Can node-exporter be configured to send a set of metrics every X time to the prometheus server? (so I don't have to expose a public port in every public server, just the prometheus server) I understand "pushgateway" is for that? How to change the node-exporter behavior?

Unfortunately, node_exporter and any other Prometheus-compatible exporter cannot push metrics to Prometheus, since Prometheus supports only pull model for data collection, e.g. it scrapes the configured targets on itself with the configured interval. See this article for details on why Prometheus prefers pull model over push model.

If you need Prometheus-like system, which supports both pull and push models, then take a look at VictoriaMetrics. It accepts data in native InfluxDB line protocol, so you can send metrics from Telegraf directly to VictoriaMetrics and then query them with PromQL and MetricsQL. See these docs.

If you aren't familiar with PromQL, then this article may be useful to start from PromQL basics.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 DisappointedByUnaccountableMod
Solution 2 DisappointedByUnaccountableMod
Solution 3 Cristian Florescu
Solution 4 valyala