'Limit the number of running jobs in SLURM

I am queuing multiple jobs in SLURM. Can I limit the number of parallel running jobs in slurm?

Thanks in advance!



Solution 1:[1]

If you are not the administrator, your can hold some jobs if you do not want them all to start at the same time, with scontrol hold <JOBID>, and you can delay the submission of some jobs with sbatch --begin=YYYY-MM-DD.

Also, if it is a job array, you can limit the number of jobs in the array that are concurrently running with for instance --array=1:100%25 to have 100 jobs in the array but only 25 of them running.

Finally, you can use the --dependency=singleton option that will only allow one of a set of jobs with the same --job-name to be running at a time. If you choose three names and distribute those names to all your jobs and use that option, you are effectively restricting yourself to 3 running jobs max.

Solution 2:[2]

According to the SLURM Resource Limits documentation, you can limit the total number of jobs that you can run for an association/qos with the MaxJobs parameter. As a reminder, an association is a combination of cluster, account, user name and (optional) partition name.

You should be able to do something similar to:

sacctmgr modify user <userid> account=<account_name> set MaxJobs=10

I found this presentation to be very helpful in case you have more questions.

Solution 3:[3]

According to SLURM documentation, --array=0-15%4 (- sign and not :) will limit the number of simultaneously running tasks from this job array to 4

I wrote test.sbatch:

#!/bin/bash
# test.sbatch
#
#SBATCH -J a
#SBATCH -p campus
#SBATCH -c 1
#SBATCH -o %A_%a.output

mkdir test${SLURM_ARRAY_TASK_ID}

# sleep for up to 10 minutes to see them running in squeue and 
# different times to check that the number of parallel jobs remain constant
RANGE=600; number=$RANDOM; let "number %= $RANGE"; echo "$number"

sleep $number

and run it with sbatch --array=1-15%4 test.sbatch

Jobs run as expected (always 4 in parallel) and just create directories and kept running for $number seconds.

Appreciate comments and suggestions.

Solution 4:[4]

If your jobs are relatively similar you can use the slurm array functions. I had been trying to figure this out for a while and found this solution at https://docs.id.unibe.ch/ubelix/job-management-with-slurm/array-jobs-with-slurm

#!/bin/bash -x
#SBATCH --mail-type=NONE
#SBATCH --array=1-419%25  # Submit 419 tasks with with only 25 of them running at any time

#contains the list of 419 commands I want to run
cmd_file=s1List_170519.txt

cmd_line=$(cat $cmd_file | awk -v var=${SLURM_ARRAY_TASK_ID} 'NR==var {print $1}')    # Get first argument

$cmd_line  #may need to be piped to bash

Solution 5:[5]

Expanding on the accepted answer, in my case, I needed to run a maximum number of jobs per node, and I needed to do it exclusively using srun (not sbatch). The way I resolved this problem was to use use these three flags together: --nodename=<nodename> --dependency=singleton --job-name=<uniquename>_<nodename>.

First I create an array with x unique names, where the length of that array is the maximum number of jobs I want to run per node. Second, I create an array with all the node names I want to use. Finally I combine these two arrays in a cyclic fashion, that is, I append the node name to the unique name, and I make sure that the value for --nodename matches the values of the appended nodename. The result is that of limiting the maximum number of jobs that run on each node, rather than to limit the max number of jobs. In my case I needed to distribute it this way mainly due to memory constraints on each node.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 AndresM
Solution 3 aerijman
Solution 4 lonestar21
Solution 5 Reniel Calzada