Running Jobs

A full account of how to use Grid Engine to run jobs would be impossible here, but for most people there are only a few commands that are needed. There is user documentation for version [WWW] 6.2u7, which should hopefully be similar enough to the version we have (6.2u5); users should have a look at chapter 1 for an overview of how things work. Chapter 2 should cover pretty much any use case you might have. There are also man pages on the head node if you're wondering about the syntax of a particular command.

Submitting Jobs

Grid Engine jobs are submitted using the qsub command. The best way to structure jobs is as a shell script that sets up any environment needed, calls your program and then executes any post processing needed. Here's an example of a simple script (available at ~rdaly/example/ on van-halen)

#$ -cwd
#$ -j y
#$ -S /bin/bash
sleep 10

If we qsub this script qsub ~rdaly/example/, it gets sent to the job scheduling system, which dispatches it to one of the compute nodes.

Looking at jobs

Use the qstat command to look at any jobs you have submitted. E.g. if we run qsub ~rdaly/example/ 4 times

$ qstat 
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
     36 0.00000   rdaly        qw    10/11/2012 20:40:17                                    1        
     37 0.00000   rdaly        qw    10/11/2012 20:40:18                                    1        
     38 0.00000   rdaly        qw    10/11/2012 20:40:19                                    1        
     39 0.00000   rdaly        qw    10/11/2012 20:40:19                                    1

If you want to look at all users jobs, run

$ qstat -u \*

Submitting multiple similar jobs

If you need to submit multiple jobs that are the same, except for some parameters (e.g. data set, input parameters), it's best to submit an array job. We do this by running qsub -t 1:n, where n is the number of jobs. This sets an environment variable in the script, SGE_TASK_ID, that can be used to, e.g., select a data file (file.${SGE_TASK_ID}) or index into a file with other parameters (sed -n "${SGE_TASK_ID}p" <parameter file>).

Saving Files

It is possible that your jobs might not complete as you anticipated, leaving half-finished data around. If these are stored on /state/partition1 on one of the compute nodes, it is possible to get them out by copying them onto your home directory. Either ssh onto the node and copy the files or scp the files directly from the head node. In either case, you will need to know the name of the compute node where the files are; this should be in a log file.

$ ssh compute-0-j
$ cp /state/partition1/<my files> /home/<my home dir>/<destination dir>
$ scp compute-0-j:/state/partition1/<my files> /home/<my home dir>/<destination dir>

last edited 2013-03-05 12:42:55 by RonanDaly