'How can I increase spark.driver.memoryOverhead in Google dataproc?
I am getting two types of errors in running a job on Google dataproc and it is causing executors to be lost one by one until the last executor is lost and the job fails. I have set my master node to n1-highmem-2 (2 vCPU, 13 GB memory) and have set two worker nodes to n1-highmem-8 (8 vCPU, 52 GB memory). The two errors I get are:
- "Container exited from explicit termination request."
- "Lost executor x: Executor heartbeat timed out"
From my understanding in what I could see online, I need to increase spark.executor.memoryOverhead. I don't know if this is the right answer or not, but I can't see how to change this in the Google dataproc console, and I don't know what to change it to. Any help would be great!
Thanks, jim
Solution 1:[1]
You can set Spark properties at cluster-level with
gcloud dataproc clusters create ... --properties spark:<name>=<value>,...
and/or job-level with
gcloud dataproc jobs submit spark ... --properties <name>=<value>,...
The former requires the spark:
prefix, the latter doesn't. If both are set, the latter takes precedence. See more details in this doc.
Solution 2:[2]
It turns out the memory per vCPU was the limitation causing the executors to fail one by one. Initially, I was trying to use the custom configuration in console mode for the cluster to add add'l memory per vCPU. It turns that the UI has some bug (per the Google Dataproc team), which limits you from increasing the memory per vCPU (if you use the slider bar to increase the memory beyond the default max of 6.5GB, the cluster set up will fail). However, if you use the command line equivalent of the console, it does allow the set up of the cluster and the increased memory per vCPU was enough to complete the job w/o the executors failing one by one.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | jmuth |