Running Jobs

A full account of how to use Grid Engine to run jobs would be impossible here, but for most people there are only a few commands that are needed. There is user documentation for version [WWW] 6.2u7, which should hopefully be similar enough to the version we have (6.2u5); users should have a look at chapter 1 for an overview of how things work. Chapter 2 should cover pretty much any use case you might have. There are also man pages on the head node if you're wondering about the syntax of a particular command.

Submitting Jobs

Grid Engine jobs are submitted using the qsub command. The best way to structure jobs is as a shell script that sets up any environment needed, calls your program and then executes any post processing needed. Here's an example of a simple script (available at ~rdaly/ on van-halen)

#$ -cwd
#$ -j y
#$ -S /bin/bash
sleep 10

If we qsub this script qsub ~rdaly/, it gets sent to the job scheduling system, which dispatches it to one of the compute nodes.

Looking at jobs

Saving Files

It is possible that your jobs might not complete as you anticipated, leaving half-finished data around. If these are stored on /state/partition1 on one of the compute nodes, it is possible to get them out by copying them onto your home directory. Either ssh onto the node and copy the files or scp the files directly from the head node. In either case, you will need to know the name of the compute node where the files are; this should be in a log file.

$ ssh compute-0-j
$ cp /state/partition1/<my files> /home/<my home dir>/<destination dir>
$ scp compute-0-j:/state/partition1/<my files> /home/<my home dir>/<destination dir>