'Slurm parallel "steps": 25 independent runs, using 1 cpu each, at most 5 simultaneously
I was previously using HTCondor as a cluster scheduler. Now even after reading Slurm documentation, I have no idea how to parallelize...
What I want to achieve is called --I think-- "embarrassingly parallel": running multiple independent instances of a program (with different inputs).
What I want: request 5 CPUs, possibly on distinct nodes; each CPUs runs the monothreaded program with its specific input. As soon as one CPU is freed, start on the next input in the queue.
Using a batch script, I tried two approaches (please help me understand their difference):
If life was simple, I would assume sufficient to combine the following sbatch options:
--ntasks=5
: to have at most 5 runs simultaneously?
--cpus-per-task=1
: each run uses one CPU (it should be the default value)
1. job array option
I try --array=0-24%5
, even if %5
appears redundant with --ntasks=5
, or is it different?
#!/usr/bin/env bash
#SBATCH --job-name=myprogram
#SBATCH --mem-per-cpu=3000 # MB
#SBATCH --output=slurmed/myprogram_%a.out
#SBATCH --error=slurmed/myprogram_%a.err
#SBATCH --ntasks=5
#SBATCH --cpus-per-task=1
#SBATCH --array=0-24%5
input_files=(myinput*.txt)
srun ./myprogram "${input_files[$SLURM_ARRAY_TASK_ID]}"
However, it persists in allocating several CPUs to each "SLURM_ARRAY_TASK_ID"!!!
I also tried without specifying --ntasks
at all, same problem.
2. Packed jobs (using ampersand spawning)
(Sorry, but why would a cluster scheduler even let you manually parallelize using shell syntax?)
#!/usr/bin/env bash
#SBATCH --job-name=myprogram
#SBATCH --mem-per-cpu=3000 # MB
#SBATCH --output=slurmed/myprogram_%J_%t.out
#SBATCH --error=slurmed/myprogram_%J_%t.err
#SBATCH --ntasks=5
#SBATCH --cpus-per-task=1
for inputfile in myinput*.txt; do
srun --exclusive ./myprogram "$input_file" &
done
wait
However if I watch htop
on the machine it is running, I see that the first job is running 5 times, and that the sbatch command is also using one extra CPU!
Should I remove the --exclusive
?
P.S: There is a useful answer here, but as I said my array command uses multiple CPUs per array unit, instead of one.
P.P.S: Additionally, the Slurm terminology is also extremely confusing:
- a job: something submitted using sbatch and/or srun?
- a job step: each time an executable is called inside the batch script? Despite being called a "step", it can occur in parallel
- a task: I don't see the difference with a job step, but the option descriptions imply that it is different (someone also asked).
Solution 1:[1]
So actually, my problem was answered here: NumCPUs shows 2 when --cpus-per-task=1. Due to hyperthreading, a physical cores of 2 threads is allocated for each job. So requesting 1 CPU per task will anyway be using 2 CPUs in slurm reports. However these 2 threads are concurrent, so running a parallelized command on it will not provide acceleration. If I want true parallelization, I have to request 4, 6, 8 or more CPUs
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | PlasmaBinturong |