'Zeppelin+Spark+Kubernetes: Let Zeppelin Job run on existing Spark Cluster
In a k8s cluster. How do you configure zeppelin to run spark jobs in an existing spark cluster instead of spinning up a new pod?
I've got a k8s cluster up and running in which I want to run Spark with Zeppelin.
Spark is deployed using the official bitnami/spark helm chart (v 3.0.0). I got one Master and two Worker pods running fine, everything good.
Zeppelin is deployed with the zeppelin-server.yaml
from the official apache-zeppelin github.
I've build my own zeppelin container without much modification from apache/zeppelin:0.9.0..
Short pseudo Dockerfile:
FROM bitnami/spark:3.0.0 AS spark
FROM apache/zeppelin:0.9-0 AS Zeppelin
COPY --from spark /opt/btinami/spark/ /opt/bitnami/spark
RUN Install kubectl
END
I modified zeppelin-server.yaml
slightly. (Image, imagePullSecret, setting spark master to the headless Service DNS of spark master)
Now I want my zeppelin jobs to run on my existing spark cluster --- with no success.
When I'm submitting zeppelin jobs (for the spark interpreter), zeppelin fires up a new spark pod and solely works with this one. Spark interpreter settings are like they should be. spark master url is set (spark://\<master-url\>:\<master-port\>
), spark home as well.
While this is kind of a sweet behaviour, it's not what I want.
What I want (and what my question is) is: I want my zeppelin pod to submit the spark jobs to the existing cluster - not fire up a new pod. I am PRETTY sure that there has to be some config/env/whatever
that I have to set but I simply can't find it.
So, I wanna ask: Is there anyone out there, who knows how to run zeppelin spark jobs on an existing spark cluster? I thought setting the spark master should do the job...
Kind regards Bob
Solution 1:[1]
Answering myself after it has been a while...
For anyone running into the same problem:
Go into the spark interpreter Settings
(optional, if you haven't already got the property) Press "edit", scroll down and add the property SPARK_SUBMIT_OPTIONS
Edit SPARK_SUBMIT_OPTIONS value and add "--master spark://<ENDPOINT OF YOUR SPARK MASTER> "
Save settings and done...
This threw me off massively, as there's already an option to set the spark master itself.
What solved the problem entering the spark master two times.
- Under key "master"
- The edit to SPARK_SUBMIT_OPTIONS described above.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Rockbob |