'Scheduling more jobs than MaxArraySize

Let's say I have 6233 simulations to run. The commands are generated and stored in a file, one in each line. I would like to use Slurm to schedule and run these commands. However, the MaxArraySize limit is 2000. So I can't use one job array to schedule all of them.

One solution is given here, where we create four separate jobs and use arithmetic indexing into the file, with the last job having a smaller number of tasks to run (233).

  1. Is it possible to do this using one sbatch script with one job ID?
  2. I set ntasks=1 when using job arrays. Do larger ntasks help in such situations?

Update: Following Damien's solution and examples given here, I ended up with the following line in my bash script:

curID=$(( ${SLURM_ARRAY_TASK_ID} * ${SLURM_NTASKS} + ${SLURM_PROCID} ))

The same can be done using Python (shown in the referenced page). The only difference is that the environment variables should be imported into the script.



Solution 1:[1]

Is it possible to do this using one sbatch script with one job ID?

No. That solution will give you multiple job IDs

I set ntasks=1 when using job arrays. Do larger ntasks help in such situations?

Yes, that is a factor that you can leverage.

Each job in the array can spawn multiple tasks (--ntasks=...). In that case, the line number in the command file must be computed from $SLURM_ARRAY_TASK_ID and $SLURM_PROCID, and the program must be started with srun. Each task in a job member of the array will run in parallel. How large the job can be will depend on the MaxJobsize limit defined on the cluster/partition/qos you have access to.

Another option is to chain the tasks inside each job of the array, with a Bash loop (for i in $seq(...) ; do ...; done). In that case, the line number in the command file must be computed from $SLURM_ARRAY_TASK_ID and $i. Each task in a job member of the array will run serially. How large the job can be will depend on the MaxWall limit defined on the cluster/partition/qos you have access to.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 damienfrancois