Cluster job scheduler (SLURM) HowTo

Basic commands

Our tasks can be controlled by using the slurm system commands. The basic commands are:

  • sbatch - insert the task into the queue
  • squeue - reports the state of jobs or job steps
  • scancel - remove jobs from the queue
  • scontrol/sinfo - information about the queue, task, etc.

Batch script

Tasks descriptions or resource requirements can be specified in the SLURM batch script file. The script is created by user. It contains specific entries, called SLURM directives - lines begining with word #SLURM. This lines are needed to run a job.

Options Description
-JSpecify a name for the job allocation. The default is the name of the batch script, or just sbatch if the script is read on sbatch's standard input.
-NRequest that a minimum of nodes be allocated to this job.
–ntasks-per-nodeSpecify an amount of cores per node can be used by job.
–mem=<MB>Specify the real memory required per node in MegaBytes.
-tSet a limit on the total run time of the job allocation hours:minutes:seconds. After this time job will be killed by queueing system.
-pRequest a specific partition for the resource allocation ie. ifj, ifj-allnodes.

In short, resource requesting, all system commands and programs are called job

An example of batch script for ifj partition

#!/bin/bash -l
#SBATCH -J myjobname
#SBATCH --ntasks-per-node 1
#SBATCH --mem 100
#SBATCH -t 20:00:00 
#SBATCH -p ifj
#SBATCH --output="mycalculations.out" 
#SBATCH --mail-type=ALL 

/path/to/my/command input.dat

It means that:

  • in queuing system job is available under the name myjobname
  • job runs on one node
  • job can use one core on each node
  • 100 MB of memory is allocated to the job
  • user declares that the job will be completed in less than 20 hours (after that the job is stopped by a queuing system)
  • job runs on partitions (queue) ifj
  • output of job is stored in mycalculations.out file
  • control file input.dat should be placed in the directory where job is executed.
  • Additional options –mail-type = 'and' '–mail-user =' can be used to receive email notifications, valid values for option “–mail-type =” are BEGIN, END, FAIL, REQUEUE, which correspond to state change. The use of special value ALL corresponds to sending email in all situations.

Submit a batch script to queue.

The command sbatch submits a batch script to SLURM queue system. All option for this command can be specified in the first batch script section in lines beginning with #SBATCH . Some options for this command are specific to a given computer.

Batch mode

To submit the job described in the script MyJob.batch execute the following command:

sbatch MyJob.batch

Afterwards the job is submitted to SLURM queue system (without the possibility to interact with program execution). Additional options for the task can be set in a script or added as command sbatch parameters.

Interactive mode

To run a script or a program interactively execute command

srun -p <ifj/ifj-allnodes> --pty bash -l

View job

Afterwards the job submitting with command sbatch queueing system assigns number to the task ( jobID ). The command:

scontrol show job <JobID>

displays the state of the specified entity with the specified jobid.

If you execute this command when your program is already running, you can among other things see on which nodes / processors it is running. During the calculations you can connect to the node using the 'ssh' , however, the session is assigned to the same processor as calculations. For this reason, frequent connection may lead to a reduction in the calculations efficiency.

Cancel jobs

The command scancel is used to cancel job from queue system (ie. when we find the error in the committed script. Syntax *scancel*:

scancel <JobID>

View queues, tasks, etc.

  • view partition (queue) ifj:
    scontrol show partition ifj
  • view detailed information about the job:
    scontrol show job <JobID> 
  • view detailed information about the node apple15:
    scontrol show node apple15
  • view general information about partition (np. time limit):
  • graphical utility to manage SLURM jobs:

Sources and more info