'Spark executor metrics don't reach prometheus sink
Circumstances:
I have read through these:
https://spark.apache.org/docs/3.1.2/monitoring.html
https://dzlab.github.io/bigdata/2020/07/03/spark3-monitoring-1/
versions: Spark3.1.2, K8s v19
I am submitting my application via
-c spark.ui.prometheus.enabled=true
-c spark.metrics.conf=/spark/conf/metric.properties
metric.properties:
*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
*.sink.prometheusServlet.path=/metrics/prometheus
Result:
Both of these endpoints have some metrics
<driver-ip>:4040/metrics/prometheus
<driver-ip>:4040/metrics/executors/prometheus
the first one - the driver one - has all the metrics
the second one - the executor one - has all the metrics except the ones under the executor namespace
described here: https://spark.apache.org/docs/3.1.2/monitoring.html#component-instance--executor
So everything is missing from bytesRead.count
to threadpool.startedTasks
But these metric are indeed reported by the executors, because under /api/v1/applications/app-id/stages/stage-id I can see those too.
I am struggled with this for hours, moving the configs to --conf flag, splitting up the configs by instances, enabling everything...etc No result.
However if I change the sink from prometheus to ConsoleSink:
*.sink.console.class=org.apache.spark.metrics.sink.ConsoleSink
*.sink.console.period=10
*.sink.console.unit=seconds
Then the metrics appear successfully.
So something is definitely wrong with the Spark-K8s-Prometheus integration.
Note:
One interesting stuff is if I split up the config by instances like
driver.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
executor.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
driver.sink.prometheusServlet.path=/metrics/prometheus1
executor.sink.prometheusServlet.path=/metrics/executor/prometheus1
(note the trailing '1' at the end)
Then the executor sink path is not taken into account , the driver metrics will be on
/metrics/prometheus1
but the exectutors will be still on /metrics/executor/prometheus
.
The class config is indeed working because if I change it to a nonexisting one, then the executor will throw an error as expected.
Solution 1:[1]
I have been looking to understand why custom user metrics are not sent to the driver, while the regular spark metrics are.
It looks like the PrometheusSink use the class ExecutorSummary, which doesn't allow to add custom metrics.
For the moment, it seems the only working way is to use the JMXExporter (and use the Java agent to export to Prometheus), or just use the ConsoleSink with
*.sink.console.class=org.apache.spark.metrics.sink.ConsoleSink
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Gregoire |