'Problem running COMSOL in a cluster with SLURM
I am trying to upload this job via a .sh script to a cluster with SLURM, using the COMSOL software:
#!/bin/bash
#SBATCH --job-name=my_work
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=20
#SBATCH --mem=20G
#SBATCH --partition=my_partition
#SBATCH --time=4-0
#SBATCH --no-requeue
#SBATCH --exclusive
#SBATCH -D $HOME
#SBATCH --output=Lecho1_%j.out
#SBATCH --error=Lecho1_%j.err
cd /home/myuser/myfile/
module load intel/2019b
module load OpenMPI/4.1.1
module load COMSOL/5.5.0
comsol batch -mpibootstrap slurm -nn 20 -nnhost 20 -inputfile myfile.mph -outputfile
myfile.outout.mph -study std1 -batchlog myfile.mph.log
and when doing so I get the following error message:
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(805): fail failed
MPID_Init(1743)......: channel initialization failed
MPID_Init(2137)......: PMI_Init returned -1
Can anyone tell me what it means and how to fix it completely?
Solution 1:[1]
The way you call COMSOL is incorrect. Submission script should contain the following lines to run COMSOL in a cluster with SLURM:
#!/bin/bash
#SBATCH --partition=regular
#SBATCH --job-name=COMSOL_JOB
#SBATCH --mem=200gb
#SBATCH --cpus-per-task=1
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=48
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err
module load COMSOL/5.5
comsol batch -mpirmk pbs -job b1 -alivetime 15 -recover \
-inputfile "mymodel.mph" -outputfile "mymodel.mph.out" \
-batchlog "mymodel.mph.log"
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | rahjoo |