Batch Example: Job Arrays
To process a large number of files, it is not necessary to generate a batch per file (or group of files) to be processed; Slurm it managed via Job Arrays. They allow, in a single Batch file, to generate a large number of similar jobs and to configure the number number of jobs running in parallel (for example, processing 10,000 files per batch of 50 maximum simultaneously).
A Job Array is created by simply adding the SBATCH option "--array" (or "-a"). The option accepts a list of indices (or a interval with, optionally, an increment), and the maximum number of Jobs to run in parallel. Without this maximum, Slurm will run, by default, as many jobs as possible (depending on the resources available). To limit the number of parallel jobs, remember no specify a maximum!
The choice of indices used in the array is free (arbitrary and values not necessarily continuous). The important thing is to choose indices that will identify the Job and therefore select the resource(s) to use (file, database ID, ...).
Slurm makes this index accessible in the Batch through the variable environment "SLURM_ARRAY_TASK_ID".
Use :
#SBATCH --array=<start>-<end>[:<step>][%<maxParallel>]
Or :
#SBATCH --array=<list>[%<maxParallel>]
Usage examples:
#SBATCH --array=0-15 # 15 jobs (indices de 0 à 15 inclus).
#SBATCH --array=10-16:2 # 4 jobs (indices : 10,12,14,16).
#SBATCH --array=2,3,5,7,11,13 # 6 jobs.
#SBATCH --array=1-10000%32 # 10 000 jobs, 32 jobs max en //
Job Description :
The example below uses a Job Array to encode 5,000 videos in batches of 8 maximum in parallel. The indices chosen for the Array download the naming of the files to process (video-<index>.mp4).
Batch content (job.sh):
# SBATCH options:
#SBATCH --job-name=Encode-Batch # Job Name
#SBATCH --cpus-per-task=4 # Allocation of 4 threads per Task
#SBATCH --mail-type=END # Email notification of the
#SBATCH --mail-user=firstname.lastname@aniti.fr # end of job execution.
#SBATCH --array=1-5000%8 # 5000 Jobs, 8 max in parallel
# Treatment
module purge # delete all loaded module environments
module load ffmpeg/0.6.5 # load ffmpeg module version 0.6.5
ffmpeg -i video-$SLURM_ARRAY_TASK_ID.mp4 -threads
$SLURM_CPUS_PER_TASK [...] video-\$SLURM_ARRAY_TASK_ID.mkv
Remarks :
-
Each encoding (Task) using 4 threads, the Job Array will monopolize maximum 8x4 or 32 threads simultaneously.
-
The file to be encoded is determined according to the Job index Array.
Batch execution:
The Batch is transmitted to Slurm via the "sbatch" command which, except error or refusal, creates a Job and places it in the queue.
[firstname.lastname@cr-login-1 ~]# sbatch job.sh