Batch Slurm

A Batch Slurm is a file that describes a request for allocation of resources for running a process (Job). It includes two parts:

  • Job parameters via SBATCH options written under form of Shell comments (Bash, ...). These options allow to specify the requested resources (CPUs, RAM, time...), the name of the Job, the location of the output file, the email address for notifications...

  • processing (shell script).

A Job Slurm can be structured to be instantiated in large numbers (Arrays), be split into steps (Steps) and can run one or several Task(s).

Jobs

A Job is an allocation of resources (CPUs, RAM, time...) reserved for the execution of a specific process.

  • The allocation is defined in the Batch in number of Tasks (ntasks) multiplied by the number of CPUs per Task (cpus-per-task) and corresponds to maximum resources usable in parallel

  • The script creates one or more Job Steps and manages the distribution Tasks on Compute nodes.

Job Steps

A Job Step represents a step or section of the processing performed by the Job. It executes one or more Tasks via the "srun" command. This division into Job Steps offers great flexibility in the organization of the stages of the Job and the management and analysis of allocated resources:

  • the Steps can be executed sequentially or in parallel,

  • a Step can initiate one or more Tasks executed in parallel,

  • Steps are supported by sstat/sacct commands, allowing both Step-by-Step progress monitoring of the Job during execution, and detailed resource usage statistics for each Step (during and after execution).

Tasks

A Task is a process to which the defined resources are allocated in the Batch by the "--cpus-per-task" option. A Task can have these resources like any process (creation of threads, sub-processes possibly themselves multi-threaded).

This is the Job's resource allocation unit. Unused CPUs by a Task will be "lost", not usable by any other Task or Step. If the Task creates more processes/threads than allocated CPUs, these threads will share the allowance.

Partition

A Partition is a logical grouping of compute nodes. The ANITI computing cluster is split into two distinct partitions: “CPU-Nodes” and “GPU-Nodes”, each composed of nodes of different dimension calculation. This separation makes it possible to specialize and optimize each partition for a particular type of jobs.