2. SLURM: HPC Job Manager#

2.1. Pre-Installed Softwares#

  • there are sevearl preinstalled software you can use

  • They can be started with the command, module

  • Some softwares may not be accessible by default and must be loaded in

  • This allows Research Computing to provide multiple versions of the software concurrently and enables users to easily switch between different versions.

2.2. SLURM Commands#

sinfo
sbatch SCRIPT
scancel JOBID

::::{margin}
```{tip}
SLURM commands will have the letter 's' before it. This seems to confuse a lot of people when they first learn. To help you remember, try recalling that we use `ssh` to login to the computer. 
  • When you access the supercomputer remotely (ssh username@hostname)

  • you will always start at /home, check by running the command, pwd

Slurm Scripts

It will be important for you to differentiat between a shell script and a Slurm script. When I make files, I typically use the file extention .slurm for jobs I plan to submit to the cluster. The slurm scripts have a specific notation you must follow.

Directives (#SBATCH)

The way that Slurm determines how to allocate your jobs to the cluster (i.e. across how many compute nodes, with how many CPUs, for how long etc) is via Slurm directives that are included at the top of your job script. These directives are indicated by lines starting with #SBATCH. Common types of jobs are:

#!/bin/bash
#SBATCH --partition=PARTITION
#SBATCH --time=TIME (DAYS-HOURS:MINUTES:SECONDS)
#SBATCH --nodes=NODES
#SBATCH --ntasks=NTASKS
#SBATCH --output=%j.out 
#SBATCH --error=%j.err
#SBATCH --name=JOBNAME

echo "Hello World!"
#!/bin/bash
#SBATCH --partition=GPU_PARTITION
#SBATCH --time=TIME (DAYS-HOURS:MINUTES:SECONDS)
#SBATCH --nodes=NODES
#SBATCH --ntasks=NTASKS
#SBATCH --output=%j.out 
#SBATCH --error=%j.err
#SBATCH --name=JOBNAME
#SBATCH --gres=gpu:1

echo "Hello World!"
##!/bin/bash
##SBATCH --partition=GPU_PARTITION
##SBATCH --time=TIME (DAYS-HOURS:MINUTES:SECONDS)
##SBATCH --nodes=NODES
##SBATCH --ntasks=NTASKS
##SBATCH --output=%j.out 
##SBATCH --error=%j.err
##SBATCH --name=JOBNAME
##SBATCH --mem=0
##SBATCH --exclusive

echo " Wow you dick."

To submit a .slurm job use sbatch. Alternatively, you can cancel it with scancel.

sbatch job.slurm   # This will print a JOB_ID
scancel JOB_ID
squeue             # All jobs queue
squeue -u username # your usernameUSERNAME queue