Computational Science Community Wiki

Sun Grid Engine: Job Arrays

Why?

Suppose you wish to run a large number of largely identical jobs: you may wish to run the same program many times with different arguments or parameters; or perhaps process a thousand different input files. You may have used a Condor pool to do this (where idle PCs on campus are used to run your jobs overnight) but systems such as the CSF can also run these High Throughput Computing jobs. One might write a Perl script to generate all the required qsub files and a BASH script to submit them all. However this is not a good use of your time and it will do horrible things to the submit (login) node on a cluster.

A much better way is to use an SGE Array Job! Below we describe how to submit an SGE job comprising numerous serial (single core) tasks and also SMP (multicore) tasks.

What?

An SGE array job might be described as a job with a for-loop built in. Here is a simple example:

Computationally, this is equivalent to 1000 individual queue submissions in which SGE_TASK_ID takes the values 1, 2, 3. . .   1000, and where input and output files are indexed by the ID. Please note that for serial jobs you don't use a 'pe' setting.

Multi-core (SMP) tasks (e.g., OpenMP jobs) can also be run in jobarrays. Each task will run your program with the requested number of cores. Simply add a -pe option to the jobsscript and then tell your program how many cores it can use in the usual manner (see parallel job submission). Please be aware that each task will be requesting the specified resources (number of cores). It may take longer for each task to get through the batch queue, depending on how busy the system is.

An example SMP array job is given below:

In both of the above cases:

There are many ways to use the $SGE_TASK_ID variable to supply a different input to each task.

For example — run each task in a separate directory (folder):

More

More on SGE Job Arrays can be found at:

A More General For Loop

It is not necessary that SGE_TASK_ID starts at 1; nor must the increment be 1. For example:

so that SGE_TASK_ID takes the values 100, 105, 110, 115... 995. However, the SGE_TASK_ID is not allowed to start at 0.

Incidently, in the case in which the upper-bound is not equal to the lower-bound plus an integer-multiple of the increment, for example

SGE automatically changes the upper bound, viz

There are three more automatically created environment variables one can use, as illustrated by this simple qsub script:

A List of Input Files

One can be sneaky — suppose we have a list of input files, rather than input files explicitly indexed by suffix:

Bash Scripting and Arrays

Another way of passing different parameters to your application (e.g., to run the same simulation but with different input parameters) is to list all the parameters in a bash array and index in to the array. For example:

Running from Different Directories

This example runs the same code but from different directories. Here we expect each directory to contain an input file. You can name your directories (and subdirectories) appropriately to match your experiments. We use BASH scripting to index in to arrays giving the names of directories. This example requires some knowledge of BASH but it should be straight forward to modify for your own work.

In this example we have the following directory structure (use whatever names are suitable for your code)

Hence we have 3*2*4=24 input files all named myinput.dat in paths such as

The following jobscript will run the executable mycode.exe in each path (so that we process all 24 input files). In this example the code is a serial code (hence no PE is specified)

You may not need three levels of subdirectories and you'll want to edit the names (BASE, EXE, DIRS1, DIRS2, DIRS3) and change the number of tasks requested.

To submit your job simply use qsub myjobscript.sh - i.e., you only submit a single jobscript.