'Prometheus for k8s multi clusters

I have 3 kubernetes clusters (prod, test, monitoring). Iam new to prometheus so i have tested it by installing it in my test environment with the helm chart:

# https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
helm install [RELEASE_NAME] prometheus-community/kube-prometheus-stack

But if i want to have metrics from the prod and test clusters, i have to repeat the same installation of the helm and each "kube-prometheus-stack" would be standalone in its own cluster. It is not ideal at all. Iam trying to find a way to have a single prometheus/grafana which would federate/agregate the metrics from each cluster's prometheus server.

I found this link, saying about prometheus federation:

https://prometheus.io/docs/prometheus/latest/federation/

If install the helm chart "kube-prometheus-stack" and get rid of grafana on the 2 other cluster, how can i make the 3rd "kube-prometheus-stack", on the 3rd cluster, scrapes metrics from the 2 other ones?
thanks



Solution 1:[1]

You have to modify configuration for prometheus federate so it can scrape metrics from other clusters as described in documentation:

scrape_configs:
  - job_name: 'federate'
    scrape_interval: 15s

    honor_labels: true
    metrics_path: '/federate'

    params:
      'match[]':
        - '{job="prometheus"}'
        - '{__name__=~"job:.*"}'

    static_configs:
      - targets:
        - 'source-prometheus-1:9090'
        - 'source-prometheus-2:9090'
        - 'source-prometheus-3:9090'

params field checks for jobs to scrape metrics from. In this particular example

It will scrape any series with the label job="prometheus" or a metric name starting with job: from the Prometheus servers at source-prometheus-{1,2,3}:9090

You can check following articles to give you more insight of prometheus federation:

  1. Monitoring Kubernetes with Prometheus - outside the cluster!

  2. Prometheus federation in Kubernetes

  3. Monitoring multiple federated clusters with Prometheus - the secure way

  4. Monitoring a Multi-Cluster Environment Using Prometheus Federation and Grafana

Solution 2:[2]

You have few options here:

Option 1:

You can achieve this buy having vmagent or grafana-agent in prod and test clusters and configure remote write on them to your monitoring cluster.

But in this case you will need to install kube-state-metrics and node-exporter separately into prod and test cluster.

Also it's important to add extra label for a cluster name (or any unique identifier) before sending metrics to remote write, to make sure that recording rules from "kube-prometheus-stack" are working correctly

diagram

Option 2:

You can install victoria-metrics-k8s-stack chart. It has similar functionality as kube-prometheus-stack - also installs bunch of components recording rules and dashboards.

With this case you install victoria-metrics-k8s-stack in every cluster, but with different values. For monitoring cluster you can use default values, with

grafana:
  sidecar:
    dashboards:
      multicluster: true

and proper configured ingress for vmsingle

For prod and test cluster you need to disable bunch of components

defaultRules:
  create: false

vmsingle:
  enabled: false
alertmanager:
  enabled: false
vmalert:
  enabled: false
vmagent:
  spec:
    remoteWrite:
      - url: "<vmsingle-ingress>/api/v1/write"
    externalLabels:
      cluster: <cluster-name>

grafana:
  enabled: false
  defaultDashboardsEnabled: false

in this case chart will deploy vmagent, kube-state-metrics, node-exporter and scrape configurations for vmagent.

diagram

Solution 3:[3]

You could try looking at Wavefront. It's a commercial tool now but you can get a 30 day trial free - also, it understands promQL. So essentially, you could use the same prometheus rules and config across all clusters, and then use wavefront to just connect to all of those prom instances.

Another option may be Thanos, but I've never used it personally.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 kool
Solution 2
Solution 3 Karan Kapoor