Appendices
Viewing queued jobs
The "squeue" command allows you to display the Jobs in queue waiting.
Official Documentation: https://slurm.schedmd.com/squeue.html
View all jobs on the waiting list:
$ squeue
"squeue" displays the pending Jobs of all users by default.
Show own Jobs only:
$ squeue -u <user>
or (generic)
$ squeue -u $(whoami)
The $(whoami) command will return your "username" and serve as filter to the squeue command.
Displaying information of a specific Queued Job:
$ squeue -j1234
Customizing displayed fields:
$ squeue -o "%A %j %a %P %C %D %n %R %V"
The "--format" option ("-o" in short) allows you to select the fields to display. Refer to the documentation of command for a full list of available fields. This example will display the following fields:
JOBID NAME ACCOUNT PARTITION CPUS NODES REQ_NODES NODELIST(REASON)SUBMIT_TIME
Show queue for a specific partition:
$ squeue -p CPU-Nodes
Tracking the progress of Jobs, Job Steps and resource usage
The command "sacct" allows to obtain a large number information about Jobs and their Steps.
Official documentation: https://slurm.schedmd.com/sacct.html
The following command displays the default information of Job #1234 :
$ sacct -j1234
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
1234 slurm-job+ CPU-Nodes test 6 RUNNING 0:0
1234.0 slurm-tas+ test 6 COMPLETED 0:0
1234.1 slurm-tas+ test 6 COMPLETED 0:0
1234.2 slurm-tas+ test 6 RUNNING 0:0
The first line corresponds to the whole Job and the following lines (JobID followed by a period '.') indicate the different Steps of the Job. Step "763.2" (3rd Step of Job 763) is in progress here running (State: RUNNING).
The option "--format" (or "-o", short version) allows to choose the fields to display and their screen size (using '%'). Of many attributes are available; consult the documentation for the complete list and their meaning.
$ sacct -j1234 -o JobID,JobName%-20,State,ReqCPUS,Elapsed,Start,ExitCode
JobID JobName State ReqCPUS Elapsed Start ExitCode
------------ -------------------- ---------- -------- ---------- ------------------- --------
1234 slurm-job-test-%j RUNNING 6 00:05:49 2017-02-15T14:55:43 0:0
1234.0 slurm-task.sh COMPLETED 6 00:01:34 2017-02-15T14:55:43 0:0
1234.1 slurm-task.sh COMPLETED 6 00:01:31 2017-02-15T14:57:17 0:0
1234.2 slurm-task.sh RUNNING 6 00:01:02 2017-02-15T14:58:48 0:0
The information displayed here is chosen and the JobName field displays now the full name of the Job and the Steps.
Display resource usage statistics (CPU/RAM/Disk...):
$ sacct -j1234 -o JobID,JobName%-20,State,ReqCPUS,Elapsed,UserCPU,CPUTime,MaxRSS,Start
JobID JobName State ReqCPUS Elapsed UserCPU CPUTime MaxRSS Start
------------ -------------------- ---------- -------- ---------- ---------- ---------- ---------- -------------------
1234 slurm-job-test-%j RUNNING 6 00:06:16 35:55.461 00:37:36 15/02/2017 14:55:43
1234.0 slurm-task.sh COMPLETED 6 00:01:34 09:02.638 00:09:24 36304K 15/02/2017 14:55:43
1234.1 slurm-task.sh COMPLETED 6 00:01:31 08:55.011 00:09:06 33128K 15/02/2017 14:57:17
1234.2 slurm-task.sh RUNNING 6 00:01:02 06:02.144 00:06:18 35128K 15/02/2017 14:58:48
Note: Some attributes are only available once the Step is done.
Note 2: It is possible to change the date display format by modifying the SLURM_TIME_FORMAT environment variable. The format of date used by Slurm is that of the C function "strftime" (http://man7.org/linux/man-pages/man3/strftime.3.html).
The example above uses the French date format (DD/MM/AAAA hh:mm:ss). To define the date format, the easiest way is add the following line to the ".bashrc" file of your HOME user:
export SLURM_TIME_FORMAT='%d/%m/%Y %T' # Définit le format d'affichage date/heure des commandes SLURM:JJ/MM/AAAA hh:mm:ss
Display for a Job executing Steps in parallel:
$ sacct -j1234 -o JobID,JobName%-20,State,ReqCPUS,Elapsed,Start,ExitCode
JobID JobName State ReqCPUS Elapsed Start ExitCode
------------ -------------------- ---------- -------- ---------- ------------------- --------
1234 slurm-job-test-%j RUNNING 18 00:00:43 15/02/2017 15:03:38 0:0
1234.0 slurm-task.sh RUNNING 6 00:00:43 15/02/2017 15:03:38 0:0
1234.1 slurm-task.sh RUNNING 6 00:00:43 15/02/2017 15:03:38 0:0
1234.2 slurm-task.sh RUNNING 6 00:00:43 15/02/2017 15:03:38 0:0
Note: The allocation here is 18 CPUs instead of the previous 6 (3 Tasks of 6 CPUs are executed in parallel).
For those who would like to retrieve this information to process it in a script (and/or format them in another language), sacct disposes "--parsable" and "--parsable2" options which return the same information but whose fields are separated by a "pipe" ("|"). In "parsable" mode, field sizes ("%..") are useless, non-truncated values are always returned. The difference between these two options is that "--parsable" adds a "|" at the end of the line while "--parsable2", no.
$ sacct -j1234 -o JobID,JobName,State,ReqCPUS,Elapsed,Start,ExitCode --parsable2
JobID|JobName|State|ReqCPUS|Elapsed|Start|ExitCode
1234|slurm-job-test-%j|RUNNING|6|00:05:49|15/02/2017 14:55:43|0:0
1234.0|slurm-task.sh|COMPLETED|6|00:01:34|15/02/2017 14:55:43|0:0
1234.1|slurm-task.sh|COMPLETED|6|00:01:31|15/02/2017 14:57:17|0:0
1234.2|slurm-task.sh|RUNNING|6|00:01:02|15/02/2017 14:58:48|0:0
Note: The "--noheader" option also allows you to not show headers in output.
Common SBATCH options
The "sbatch" command receives its parameters on the command line but also allows them to be set via SBATCH "directives" under form of comment in the header of the file. Both methods produce the same result but those declared on the command line will have the priority in case of conflict. In both cases, these options exist (mostly) in short and long version (example: -n or --ntasks).
For more information on "sbatch", see the official documentation at the address :
https://slurm.schedmd.com/sbatch.html
#SBATCH --partition= part
Choice of SLURM partition to use for the job. See section: Scores.
#SBATCH --job-name= name
Defines the name of the job as it will be displayed in the different Slurm commands (squeue, sstat, sacct)
#SBATCH --output= stdOutFile
#SBATCH --error= stdErrFile
#SBATCH --input= stdInFile
#SBATCH --open-mode= append|truncate
These options define the input/output redirections of the job (standard input/output/error).
The standard output (stdOut) will be redirected to the file defined by "--output" or, if not defined, a default file "slurm-%j.out" (Slurm will replace "%j" with the JobID).
The error output (stdErr) will be redirected to the file defined by "--error" or, if undefined, to standard output.
Standard input can also be redirected with "--input". By default, "/dev/null" is used (none/empty).
The "--open-mode" option defines the mode for opening (writing) files and behaves like an open/fopen of most languages programming (2 possibilities: "append" to write after of the file (if it exists) and "truncate" to overwrite the file at each execution of the batch (default value)).
#SBATCH --mail-user= email
#SBATCH --mail-type=BEGIN|END|FAIL|TIME_LIMIT|TIME_LIMIT_50|...
Allows you to be notified by e-mail of a particular event in the life of the job: start of execution (BEGIN), end of execution (END, FAIL and TIME_LIMIT)... See Slurm documentation for list Full list of supported events.
#SBATCH --cpus-per-task= n
Defines the number of CPUs to allocate per Task. Actual use of these CPUs is the responsibility of each Task (creation of processes and/or threads). Attention, for the 'CPU-Nodes' partition, multithreading being activated on servers, the notion of CPU corresponds to a thread.
#SBATCH --ntasks= n
Defines the maximum number of Tasks executed in parallel.
#SBATCH --mem-per-cpu= n
Defines the RAM in MB allocated to each CPU. The default value and the maximum value depend on partition used:
- CPU-Nodes: 5120 MB by default, this is also the maximum value
- GPU-Nodes: 10240 MB by default, this is also the maximum value
#SBATCH --nodes= minnodes[-maxnodes]
Minimum[-maximum] number of nodes on which to distribute the Tasks.
#SBATCH --ntasks-per-node= n
Used in conjunction with --nodes, this option is an alternative to --ntasks which allows to control the distribution of Tasks on the different nodes.
Default values and/or inferred by Slurm:
Without explicit declaration of Job Step(s), only one Task will be created and the parameters "--ntasks", "--nodes", "--nodelist" ... will be ignored.
In general, when "--nodes" is not defined, Slurm automatically determines the number of nodes required (depending on the use of nodes, the number of CPUs-per-Node/Tasks-per-Node/CPUs-per-Task/Tasks, etc.).
If "--ntasks" is not defined, one Task per node will be allocated.
Note that the number of Tasks of a Job can be defined either explicitly with "--ntasks" or implicitly by setting "--nodes" and "--ntasks-per-node".
If the "--ntasks", "--nodes" and "--ntasks-per-node" options are all defined, then "--ntasks-per-node" will show the number maximum Tasks per node.