Processing large volume of files

Although Job Steps and Job Arrays both allow you to launch multiple and parallel processing operations, their implementation and they work very differently:

  • Job Arrays are very simple to set up (only one option SBATCH "array" to add) and manage the amount of Jobs to run simultaneously while Job Steps require you to perform the manual work (iteration on sources, creation of steps background...).

  • A Job Array is a collection of Jobs. Therefore, the Jobs are executed individually depending on the resources available; as soon as one of the Jobs of the Array ends, Slurm immediately executes the next one, which reduces the total time execution and optimizes the use of resources. Jobs Steps, on the other hand, only require the full resources requested are all available to run the Job.

To perform similar processing on a large number of sources, use Job Arrays.

If, on the other hand, the processing to be performed requires executing parallel very different treatments (not just a question source/file) and/or it is not possible to select a source from an index, use Jobs Steps.

It is also quite possible to use both at the same time!