'Running Python scripts in Slurm
I've recently started a new job and need to run some scripts on the HPC through Slurm.
My scripts are written in Python, and therefore I want to execute these using python script.py
in my .slurm
file.
However, when I try to run the .slurm
file, it doesn't seem to be able to call the python scripts. I've tried loading the python environment using module load anaconda3
, and variations thereof (e.g. module load python
, etc.). Attached is my array.slurm
file, for reference(.slurm file). I've left the account and mail-user empty for uploading here for anonymity, but I have these in when I run the script.
The error file output by Slurm indicates the following:
/var/spool/slurmd/job220829/slurm_script: line 19: module: command not found
Can someone offer practical guidance? I need to run these Python scripts as soon as possible.
Solution 1:[1]
As md2perpe mentioned every HPC system is different. They customize the slurm scheduler up to some extent. Still many HPCs share the same basic commands.
For instance, here is a job submission script that I created to run a python file on a GPU node.
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --time=00:00:40
#SBATCH --ntasks=1
#SBATCH --job-name=gpu_check
#SBATCH --output=gpu.%j.out
#SBATCH --error=gpu.%j.err
#SBATCH --gres=gpu:1
#SBATCH --account=scw1901
#SBATCH --partition=accel_ai
module load anaconda/3
source activate base
python gpu.py
I can suggest you the following:
- After loading anaconda module you should activate the conda virtual environment. For example,
source activate base
. To see a list of available conda environments type thisconda env list
. Then activate the conda environment of your choice. - I don't know what your python script is, so can't really comment on the argument that you used.
- Make sure you have access to the partition. To see a list of partition type
sinfo
. Also check the state. If it isdrain
orreserved
then you simply can't use that partition. - Maybe you can run your script without
--ntasks-per-nodes
and--array
. Why not try my job script? - If nothing works, please paste the output of error file in your question. In my case, the
JOBID
is defined by%J
not%a
as in your case. - You can remove those email arguments
--mail
if you don't need it. - What is
SLURM_ARRAY_TASK_ID
? If you don't know please remove it. - You said you don't have
module
command. The error is in line 19. But you used module command in line 18. Are you sure you are sharing the correct job script? - Can you run
module load anaconda/3
in the login node? Just copy and paste this after SSHing. If yes thenmodule
is available.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |