Appendices

Viewing queued jobs

The "squeue" command allows you to display the Jobs in queue waiting.

Official Documentation: https://slurm.schedmd.com/squeue.html

View all jobs on the waiting list:

$ squeue

"squeue" displays the pending Jobs of all users by default.

Show own Jobs only:

$ squeue -u <user>

or (generic)

$ squeue -u $(whoami)

The $(whoami) command will return your "username" and serve as filter to the squeue command.

Displaying information of a specific Queued Job:

$ squeue -j1234

Customizing displayed fields:

$ squeue -o "%A %j %a %P %C %D %n %R %V"

The "--format" option ("-o" in short) allows you to select the fields to display. Refer to the documentation of command for a full list of available fields. This example will display the following fields:

JOBID NAME ACCOUNT PARTITION CPUS NODES REQ_NODES NODELIST(REASON)SUBMIT_TIME

Show queue for a specific partition:

$ squeue -p CPU-Nodes

Tracking the progress of Jobs, Job Steps and resource usage

The command "sacct" allows to obtain a large number information about Jobs and their Steps.

Official documentation: https://slurm.schedmd.com/sacct.html

The following command displays the default information of Job #1234 :

$ sacct -j1234

JobID           JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
1234         slurm-job+ CPU-Nodes        test          6    RUNNING      0:0
1234.0       slurm-tas+                  test          6  COMPLETED      0:0
1234.1       slurm-tas+                  test          6  COMPLETED      0:0
1234.2       slurm-tas+                  test          6    RUNNING      0:0

The first line corresponds to the whole Job and the following lines (JobID followed by a period '.') indicate the different Steps of the Job. Step "763.2" (3rd Step of Job 763) is in progress here running (State: RUNNING).

The option "--format" (or "-o", short version) allows to choose the fields to display and their screen size (using '%'). Of many attributes are available; consult the documentation for the complete list and their meaning.

$ sacct -j1234 -o JobID,JobName%-20,State,ReqCPUS,Elapsed,Start,ExitCode

       JobID              JobName      State  ReqCPUS    Elapsed               Start ExitCode
------------ -------------------- ---------- -------- ---------- ------------------- --------
1234         slurm-job-test-%j       RUNNING        6   00:05:49 2017-02-15T14:55:43      0:0
1234.0       slurm-task.sh         COMPLETED        6   00:01:34 2017-02-15T14:55:43      0:0
1234.1       slurm-task.sh         COMPLETED        6   00:01:31 2017-02-15T14:57:17      0:0
1234.2       slurm-task.sh           RUNNING        6   00:01:02 2017-02-15T14:58:48      0:0

The information displayed here is chosen and the JobName field displays now the full name of the Job and the Steps.

Display resource usage statistics (CPU/RAM/Disk...):

$ sacct -j1234 -o JobID,JobName%-20,State,ReqCPUS,Elapsed,UserCPU,CPUTime,MaxRSS,Start

       JobID              JobName      State  ReqCPUS    Elapsed    UserCPU    CPUTime     MaxRSS               Start
------------ -------------------- ---------- -------- ---------- ---------- ---------- ---------- -------------------
1234         slurm-job-test-%j       RUNNING        6   00:06:16  35:55.461   00:37:36            15/02/2017 14:55:43
1234.0       slurm-task.sh         COMPLETED        6   00:01:34  09:02.638   00:09:24     36304K 15/02/2017 14:55:43
1234.1       slurm-task.sh         COMPLETED        6   00:01:31  08:55.011   00:09:06     33128K 15/02/2017 14:57:17
1234.2       slurm-task.sh           RUNNING        6   00:01:02  06:02.144   00:06:18     35128K 15/02/2017 14:58:48

Note: Some attributes are only available once the Step is done.

Note 2: It is possible to change the date display format by modifying the SLURM_TIME_FORMAT environment variable. The format of date used by Slurm is that of the C function "strftime" (http://man7.org/linux/man-pages/man3/strftime.3.html).

The example above uses the French date format (DD/MM/AAAA hh:mm:ss). To define the date format, the easiest way is add the following line to the ".bashrc" file of your HOME user:

export SLURM_TIME_FORMAT='%d/%m/%Y %T' # Définit le format d'affichage date/heure des commandes SLURM:JJ/MM/AAAA hh:mm:ss

Display for a Job executing Steps in parallel:

$ sacct -j1234 -o JobID,JobName%-20,State,ReqCPUS,Elapsed,Start,ExitCode

       JobID              JobName      State  ReqCPUS    Elapsed               Start ExitCode
------------ -------------------- ---------- -------- ---------- ------------------- --------
1234         slurm-job-test-%j       RUNNING       18   00:00:43 15/02/2017 15:03:38      0:0
1234.0       slurm-task.sh           RUNNING        6   00:00:43 15/02/2017 15:03:38      0:0
1234.1       slurm-task.sh           RUNNING        6   00:00:43 15/02/2017 15:03:38      0:0
1234.2       slurm-task.sh           RUNNING        6   00:00:43 15/02/2017 15:03:38      0:0

Note: The allocation here is 18 CPUs instead of the previous 6 (3 Tasks of 6 CPUs are executed in parallel).

For those who would like to retrieve this information to process it in a script (and/or format them in another language), sacct disposes "--parsable" and "--parsable2" options which return the same information but whose fields are separated by a "pipe" ("|"). In "parsable" mode, field sizes ("%..") are useless, non-truncated values are always returned. The difference between these two options is that "--parsable" adds a "|" at the end of the line while "--parsable2", no.

$ sacct -j1234 -o JobID,JobName,State,ReqCPUS,Elapsed,Start,ExitCode --parsable2

JobID|JobName|State|ReqCPUS|Elapsed|Start|ExitCode
1234|slurm-job-test-%j|RUNNING|6|00:05:49|15/02/2017 14:55:43|0:0
1234.0|slurm-task.sh|COMPLETED|6|00:01:34|15/02/2017 14:55:43|0:0
1234.1|slurm-task.sh|COMPLETED|6|00:01:31|15/02/2017 14:57:17|0:0
1234.2|slurm-task.sh|RUNNING|6|00:01:02|15/02/2017 14:58:48|0:0

Note: The "--noheader" option also allows you to not show headers in output.

Common SBATCH options

The "sbatch" command receives its parameters on the command line but also allows them to be set via SBATCH "directives" under form of comment in the header of the file. Both methods produce the same result but those declared on the command line will have the priority in case of conflict. In both cases, these options exist (mostly) in short and long version (example: -n or --ntasks).

For more information on "sbatch", see the official documentation at the address :

https://slurm.schedmd.com/sbatch.html

#SBATCH --partition= part

Choice of SLURM partition to use for the job. See section: Scores.

#SBATCH --job-name= name

Defines the name of the job as it will be displayed in the different Slurm commands (squeue, sstat, sacct)

#SBATCH --output= stdOutFile

#SBATCH --error= stdErrFile

#SBATCH --input= stdInFile

#SBATCH --open-mode= append|truncate

These options define the input/output redirections of the job (standard input/output/error).

The standard output (stdOut) will be redirected to the file defined by "--output" or, if not defined, a default file "slurm-%j.out" (Slurm will replace "%j" with the JobID).

The error output (stdErr) will be redirected to the file defined by "--error" or, if undefined, to standard output.

Standard input can also be redirected with "--input". By default, "/dev/null" is used (none/empty).

The "--open-mode" option defines the mode for opening (writing) files and behaves like an open/fopen of most languages programming (2 possibilities: "append" to write after of the file (if it exists) and "truncate" to overwrite the file at each execution of the batch (default value)).

#SBATCH --mail-user= email

Allows you to be notified by e-mail of a particular event in the life of the job: start of execution (BEGIN), end of execution (END, FAIL and TIME_LIMIT)... See Slurm documentation for list Full list of supported events.

#SBATCH --cpus-per-task= n

Defines the number of CPUs to allocate per Task. Actual use of these CPUs is the responsibility of each Task (creation of processes and/or threads). Attention, for the 'CPU-Nodes' partition, multithreading being activated on servers, the notion of CPU corresponds to a thread.

#SBATCH --ntasks= n

Defines the maximum number of Tasks executed in parallel.

#SBATCH --mem-per-cpu= n

Defines the RAM in MB allocated to each CPU. The default value and the maximum value depend on partition used:

CPU-Nodes: 5120 MB by default, this is also the maximum value
GPU-Nodes: 10240 MB by default, this is also the maximum value

#SBATCH --nodes= minnodes[-maxnodes]

Minimum[-maximum] number of nodes on which to distribute the Tasks.

#SBATCH --ntasks-per-node= n

Used in conjunction with --nodes, this option is an alternative to --ntasks which allows to control the distribution of Tasks on the different nodes.

Default values and/or inferred by Slurm:

Without explicit declaration of Job Step(s), only one Task will be created and the parameters "--ntasks", "--nodes", "--nodelist" ... will be ignored.

In general, when "--nodes" is not defined, Slurm automatically determines the number of nodes required (depending on the use of nodes, the number of CPUs-per-Node/Tasks-per-Node/CPUs-per-Task/Tasks, etc.).

If "--ntasks" is not defined, one Task per node will be allocated.

Note that the number of Tasks of a Job can be defined either explicitly with "--ntasks" or implicitly by setting "--nodes" and "--ntasks-per-node".

If the "--ntasks", "--nodes" and "--ntasks-per-node" options are all defined, then "--ntasks-per-node" will show the number maximum Tasks per node.