'Spark executor metrics don't reach prometheus sink

Circumstances:
I have read through these:
https://spark.apache.org/docs/3.1.2/monitoring.html
https://dzlab.github.io/bigdata/2020/07/03/spark3-monitoring-1/

versions: Spark3.1.2, K8s v19

I am submitting my application via

-c spark.ui.prometheus.enabled=true
-c spark.metrics.conf=/spark/conf/metric.properties

metric.properties:

*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
*.sink.prometheusServlet.path=/metrics/prometheus

Result:
Both of these endpoints have some metrics

<driver-ip>:4040/metrics/prometheus        
<driver-ip>:4040/metrics/executors/prometheus

the first one - the driver one - has all the metrics
the second one - the executor one - has all the metrics except the ones under the executor namespace described here: https://spark.apache.org/docs/3.1.2/monitoring.html#component-instance--executor
So everything is missing from bytesRead.count to threadpool.startedTasks
But these metric are indeed reported by the executors, because under /api/v1/applications/app-id/stages/stage-id I can see those too.
I am struggled with this for hours, moving the configs to --conf flag, splitting up the configs by instances, enabling everything...etc No result.

However if I change the sink from prometheus to ConsoleSink:

*.sink.console.class=org.apache.spark.metrics.sink.ConsoleSink
*.sink.console.period=10
*.sink.console.unit=seconds

Then the metrics appear successfully.

So something is definitely wrong with the Spark-K8s-Prometheus integration.

Note:

One interesting stuff is if I split up the config by instances like

driver.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
executor.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
driver.sink.prometheusServlet.path=/metrics/prometheus1
executor.sink.prometheusServlet.path=/metrics/executor/prometheus1

(note the trailing '1' at the end)
Then the executor sink path is not taken into account , the driver metrics will be on /metrics/prometheus1 but the exectutors will be still on /metrics/executor/prometheus.
The class config is indeed working because if I change it to a nonexisting one, then the executor will throw an error as expected.



Solution 1:[1]

I have been looking to understand why custom user metrics are not sent to the driver, while the regular spark metrics are.

It looks like the PrometheusSink use the class ExecutorSummary, which doesn't allow to add custom metrics.

For the moment, it seems the only working way is to use the JMXExporter (and use the Java agent to export to Prometheus), or just use the ConsoleSink with

    *.sink.console.class=org.apache.spark.metrics.sink.ConsoleSink

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Gregoire