'Can a single kubernetes job contain multiple pods with different parallelism definitions?
I have a batch job which breaks down in 3 tasks that each depend on the previous finishing before they can start:
- Run a single pod
- Run N pods in parallel (
.spec.completions
=.spec.parallelism
= N) - Run M pods in parallel (
.spec.completions
=.spec.parallelism
= M)
Each task has different resource requirements (CPU/MEM/STORAGE). Currently, I start job #1, when it finishes it runs a kubectl
command to start job #2, and so on to job #3. I have 3 separate jobs.
Can I define a single job for these 3 tasks?
Maybe something like this:
- Run single pod for task #1
- Define init container on task #2 to wait for task #1 to finish
- Run N pods for task #2 using
.spec.completions
- Define init container on task # to wait for task #2 to finish
- Run M pods for task #3 using a different
.spec.completions
appropriate for task #3
It's not clear to me if I can define separate .spec.parallelism
and .spec.completions
for different pods under the same job. And if I can define separate init containers to delay the start of the later tasks.
This may all require a more complete workflow engine like Argo (which we don't yet have available).
Solution 1:[1]
Kubernetes Job Controller creates a pod based on the single pod template in the Job spec. So No you can't have multiple pods in a Job.
But kubernetes is an extensible system and you can define your own Custom Resource and write a controller like Job controller which supports multiple pod templates with different parallelism.
Solution 2:[2]
I just figured out a way to do this.
The yaml file supports multiple files. So you can append multiple job definitions under a single yaml file, using ---
as the deliminator.
Here is an example:
apiVersion: batch/v1
kind: Job
metadata:
name: ge-test-job
spec:
template:
spec:
containers:
- name: ge-test-1
image: improbableailab/model-free
command: ["perl", "-Mbignum=bpi", "-wle", "print 1"]
resources:
limits:
memory: 200Mi
cpu: 1000m
requests:
memory: 50Mi
cpu: 500m
volumeMounts:
- mountPath: /jaynes-mounts
name: ge-pvc
restartPolicy: Never
volumes:
- name: ge-pvc
persistentVolumeClaim:
claimName: ge-pvc
backoffLimit: 4
ttlSecondsAfterFinished: 10
---
apiVersion: batch/v1
kind: Job
metadata:
name: ge-test-job-2
spec:
template:
spec:
containers:
- name: ge-test-2
image: improbableailab/model-free
command: ["perl", "-Mbignum=bpi", "-wle", "print 1"]
resources:
limits:
memory: 200Mi
cpu: 1000m
requests:
memory: 50Mi
cpu: 500m
volumeMounts:
- mountPath: /jaynes-mounts
name: ge-pvc
restartPolicy: Never
volumes:
- name: ge-pvc
persistentVolumeClaim:
claimName: ge-pvc
backoffLimit: 4
ttlSecondsAfterFinished: 10
Now if you run
? kubectl apply -f job.yaml
job.batch/ge-test-job created
job.batch/ge-test-job-2 created
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Arghya Sadhu |
Solution 2 | episodeyang |