'Spark on Kubernetes driver pod cleanup

I am running spark 3.1.1 on kubernetes 1.19. Once job finishes executor pods get cleaned up but driver pod remains in completed state. How to clean up driver pod once it is completed? any configuration option to set?

NAME                                           READY   STATUS      RESTARTS   AGE
my-job-0e85ea790d5c9f8d-driver                 0/1     Completed   0          2d20h
my-job-8c1d4f79128ccb50-driver                 0/1     Completed   0          43h
my-job-c87bfb7912969cc5-driver                 0/1     Completed   0          43h


Solution 1:[1]

spark.kubernetes.driver.service.deleteOnTermination was added to spark in 3.2.0. This should solve the issue. src: https://spark.apache.org/docs/latest/core-migration-guide.html

update: this will only delete the service to the pod..but not the pod itself

Solution 2:[2]

According to the official documentation since Kubernetes 1.12:

Another way to clean up finished Jobs (either Complete or Failed) automatically is to use a TTL mechanism provided by a TTL controller for finished resources, by specifying the .spec.ttlSecondsAfterFinished field of the Job. When the TTL controller cleans up the Job, it will delete the Job cascadingly, i.e. delete its dependent objects, such as Pods, together with the Job. Note that when the Job is deleted, its lifecycle guarantees, such as finalizers, will be honored.

Example:

apiVersion: batch/v1
kind: Job
metadata:
  name: pi-with-ttl
spec:
  ttlSecondsAfterFinished: 100
  template:
    spec:
      ...

The Job pi-with-ttl will be eligible to be automatically deleted, 100 seconds after it finishes. If the field is set to 0, the Job will be eligible to be automatically deleted immediately after it finishes.

If customisation of the Job resource is not possible you may use an external tool to clean up completed jobs. For example check https://github.com/dtan4/k8s-job-cleaner

Solution 3:[3]

Concerning the initial question "Spark on Kubernetes driver pod cleanup", it seems that there is no way to pass, at spark-submit time, a TTL parameter to kubernetes for avoiding the never-removal of driver pods in completed status.

From Spark documentation: https://spark.apache.org/docs/latest/running-on-kubernetes.html When the application completes, the executor pods terminate and are cleaned up, but the driver pod persists logs and remains in “completed” state in the Kubernetes API until it’s eventually garbage collected or manually cleaned up.

It is not very clear who is doing this 'eventually garbage collected'.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3