Information for current:

Gridengine

GridEngine (SGE) is an open source batch-queuing system, supported by Sun Microsystems, which manages and schedules the allocation of distributed resources such as processors, memory, disk-space, and software licenses. Gridengine is responsible for accepting, scheduling, dispatching and managing the remote execution of large numbers of standalone, parallel or interactive user jobs

Notes on using gridengine with the beowulf clusters.

Full documentation on using the version of gridengine installed can be found on the SunSource website. These are some basic notes to get people up and running. You should also look at and contribute to the wiki.

Head nodes

You can only access sge from the various head nodes and there is currently one queue set up per cluster.

hermes
bw1425n01-bw1425n24 and hcrc1425n01-hcrc1425n34

Current configuration

Each cluster is configured with one default queue and two Parallel Environments configured (make and lammpi). The lammpi Environment will generate a set of temporary ssh keys, fire up ssh-agent and install the keys into it, this will allow ssh access within the allocated nodes without needing to use kerberos credentials. The environment will lamboot a LAM multicomputer running over the allocates nodes, the user should only be required to run their mpi job using mpirun. When the job finishes the multicomputer is halted, the ssh agent will be killed and the keys destroyed.

If you require additional environments added please submit an RT ticket detailing your requirements.

Per User setup

In order to access the cluster commands you need to source /opt/sge/default/common/settings.sh or /opt/sge/default/common/settings.csh depending on whether your login shell is BASH or (t)csh. This should be done automatically as part of the bash/tcsh login process but if you can't access the q commands then source the above files in your startup files This should give you access to the various "q" commands e.g.
[bw530n01]iainr: qhost
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
bw530n01                lx24-x86        1  0.02    2.0G  458.0M    1.0G   38.1M
bw530n02                lx24-x86        1  0.00    2.0G  299.1M    1.0G   33.7M
bw530n03                lx24-x86        1  0.00    2.0G  369.0M    1.0G   39.7M
bw530n04                lx24-x86        1  1.01    2.0G  692.0M    1.0G   53.1M
bw530n05                lx24-x86        1  1.00    2.0G  686.0M    1.0G   42.2M
bw530n06                lx24-x86        1  1.00    2.0G  684.1M    1.0G   45.0M
bw530n07                lx24-x86        1  1.00    2.0G  693.5M    1.0G   44.6M
bw530n08                lx24-x86        1  1.06    2.0G  686.8M    1.0G   43.2M
bw530n09                lx24-x86        1  1.00    2.0G  691.8M    1.0G   41.1M
bw530n10                lx24-x86        1  1.03    2.0G  679.1M    1.0G   43.8M
bw530n11                lx24-x86        1  1.06    2.0G  684.1M    1.0G   47.8M
bw530n12                lx24-x86        1  1.01    2.0G  679.2M    1.0G   48.2M
bw530n13                lx24-x86        1  1.00    2.0G  680.7M    1.0G   48.5M
bw530n14                lx24-x86        1  1.04    2.0G  677.2M    1.0G   65.7M
bw530n15                lx24-x86        1  1.01    2.0G  681.5M    1.0G   42.3M
bw530n16                lx24-x86        1  1.00    2.0G  689.1M    1.0G   43.2M
lutzow                  lx24-x86        1  1.66    2.0G  245.7M    1.0G   44.5M
If you can't at least run qhost and get something like the above then something is wrong.

Submitting Simple jobs

the qsub command is used to submit simple jobs (runs on one node)

[bw530n01]iainr: qsub tmp
Your job 194 ("tmp") has been submitted.
[bw530n01]iainr:

Output is written to <jobname>.o<jobnumber> and standard error is written to <jobname>.e<jobnumber> so the above jobs submission would produce two files tmp.o194 and tmp.e194

Submitting lam jobs

In order to submit lam mpi jobs you have to select the correct parallel environment (lammpi) using the -pe option and specify the number of nodes you require and the batch script to run. Gridengine will set up a multicomputer on the appropriate nodes e.g. for the batch script runme:

#!/bin/sh                                                                                                                                                                                                 
mpirun -v myapp
With myapp being the lam schema file
h /home/iainr/master
C -s h /home/iainr/slave
this will generate the following files:

runme.e198 (prolog error file)

Empty

runme.o198 (prolog output file)

1878 /home/iainr/master running on local
1879 /home/iainr/slave running on n0 (o)
13909 /home/iainr/slave running on n1
11701 /home/iainr/slave running on n2
master: allocating block (0, 0) - (19, 19) to process 1
master: allocating block (20, 0) - (39, 19) to process 2
master: allocating block (40, 0) - (59, 19) to process 3
master: allocating block (60, 0) - (79, 19) to process 1
master: allocating block (80, 0) - (99, 19) to process 2
master: allocating block (100, 0) - (119, 19) to process 3
master: allocating block (120, 0) - (139, 19) to process 1
master: allocating block (140, 0) - (159, 19) to process 2
master: allocating block (160, 0) - (179, 19) to process 3
master: allocating block (180, 0) - (199, 19) to process 1
master: allocating block (200, 0) - (219, 19) to process 2
master: allocating block (220, 0) - (239, 19) to process 3
master: allocating block (240, 0) - (259, 19) to process 1
master: allocating block (260, 0) - (279, 19) to process 2
master: allocating block (280, 0) - (299, 19) to process 3
master: allocating block (300, 0) - (319, 19) to process 1
master: allocating block (320, 0) - (339, 19) to process 2
...
master: allocating block (360, 500) - (379, 511) to process 2
master: allocating block (380, 500) - (399, 511) to process 3
master: allocating block (400, 500) - (419, 511) to process 1
master: allocating block (420, 500) - (439, 511) to process 2
master: allocating block (440, 500) - (459, 511) to process 3
master: allocating block (460, 500) - (479, 511) to process 1
master: allocating block (480, 500) - (499, 511) to process 2
master: allocating block (500, 500) - (511, 511) to process 3
master: done.

runme.pe198 (PE error file)

lamboot: attempting to execute "/usr/bin/ssh -x -a bw530n11.inf.ed.ac.uk -n echo $SHELL"
lamboot: got remote shell /bin/bash
lamboot: attempting to execute "/usr/bin/ssh -x -a bw530n11.inf.ed.ac.uk -n hboot -t -c lam-conf.lam -d -s -I "-H 129.215.18.73 -P 60259 -n 1 -o 0    ""
lamboot: attempting to execute "/usr/bin/ssh -x -a bw530n12.inf.ed.ac.uk -n echo $SHELL"
lamboot: got remote shell /bin/bash
lamboot: attempting to execute "/usr/bin/ssh -x -a bw530n12.inf.ed.ac.uk -n hboot -t -c lam-conf.lam -d -s -I "-H 129.215.18.73 -P 60259 -n 2 -o 0    ""

runme.po198 (PE output file)

Starting sge-lam
created directory /tmp/keys.WanWoo:
Enter passphrase for /tmp/keys.WanWoo/tmpid:
Identity added: /tmp/keys.WanWoo/tmpid (/tmp/keys.WanWoo/tmpid)
hboot: process schema = "/etc/lam/lam-conf.lam"
hboot: found /usr/bin/lamd
hboot: performing tkill
hboot: tkill
hboot: booting...
hboot: fork /usr/bin/lamd
[1]   1828 lamd -H 129.215.18.73 -P 60259 -n 0 -o 0 -d
hboot: process schema = "/etc/lam/lam-conf.lam"
 
LAM 6.5.8/MPI 2 C++/ROMIO - Indiana University
 
lamboot: boot schema file: /tmp/198.1.all.q/lamhostfile
lamboot: opening hostfile /tmp/198.1.all.q/lamhostfile
lamboot: found the following hosts:
lamboot:   n0 bw530n07.inf.ed.ac.uk
lamboot:   n1 bw530n11.inf.ed.ac.uk
lamboot:   n2 bw530n12.inf.ed.ac.uk
lamboot: resolved hosts:
lamboot:   n0 bw530n07.inf.ed.ac.uk --> 129.215.18.73
lamboot:   n1 bw530n11.inf.ed.ac.uk --> 129.215.18.77
lamboot:   n2 bw530n12.inf.ed.ac.uk --> 129.215.18.78
lamboot: found 3 host node(s)
lamboot: origin node is 0 (bw530n07.inf.ed.ac.uk)
lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -s -I " -H 129.215.18.73 -P 60259 -n
0 -o 0     ""
hboot: found /usr/bin/lamd
hboot: performing tkill
hboot: tkill
hboot: booting...
hboot: fork /usr/bin/lamd
[1]  13895 lamd -H 129.215.18.73 -P 60259 -n 1 -o 0 -d
hboot: process schema = "/etc/lam/lam-conf.lam"
hboot: found /usr/bin/lamd
hboot: performing tkill
hboot: tkill
hboot: booting...
hboot: fork /usr/bin/lamd
[1]  11692 lamd -H 129.215.18.73 -P 60259 -n 2 -o 0 -d
lamboot completed successfully
 
LAM 6.5.8/MPI 2 C++/ROMIO - Indiana University
 
Shutting down LAM
lamhalt: sending HALT to n1 (bw530n11.inf.ed.ac.uk)
lamhalt: sending HALT to n2 (bw530n12.inf.ed.ac.uk)
lamhalt: waiting for HALT ACKs from remote LAM daemons
lamhalt: received HALT ACK from n1 (bw530n11.inf.ed.ac.uk)
lamhalt: received HALT ACK from n2 (bw530n12.inf.ed.ac.uk)
lamhalt: sending final HALT to n0 (bw530n07.inf.ed.ac.uk)
lamhalt: local LAM daemon halted
LAM halted
1824 is deceased.
keydir is /tmp/keys.WanWoo
unlink /tmp/keys.WanWoo/tmpid
unlink /tmp/keys.WanWoo/tmpid.pub
rmdir /tmp/keys.WanWoo

Interactive jobs

Gridengine supports the use of interactive jobs using the qrsh and qlogin commands, the qsh command will not work in the current configuration however this is redundant as X applications can be run from qrsh or qlogin sessions.

Both qrsh and qlogin can be used to setup interactive parallel environments using -pe as above however your should bear in mind that it may take several minutes or longer to allocate hosts and set up the environment. In the case of qlogin you may find spurious error messages being displayed, as long as it claims to be still scheduling the job please ignore these.

Typical qlogin session

[bw530n01]iainr: qlogin -pe lammpi 4
waiting for interactive job to be scheduled ...timeout (3 s) expired while waiting on socket fd 4 
Your interactive job 210 has been successfully scheduled.
timeout (4 s) expired while waiting on socket fd 4
 
Your interactive job 210 has been successfully scheduled.
timeout (4 s) expired while waiting on socket fd 4
 
Your interactive job 210 has been successfully scheduled.
timeout (5 s) expired while waiting on socket fd 4
 
Your interactive job 210 has been successfully scheduled.
timeout (4 s) expired while waiting on socket fd 4
 
Your interactive job 210 has been successfully scheduled.
timeout (3 s) expired while waiting on socket fd 4
 
Your interactive job 210 has been successfully scheduled.
timeout (5 s) expired while waiting on socket fd 4
 
Your interactive job 210 has been successfully scheduled.
timeout (4 s) expired while waiting on socket fd 4
 
Your interactive job 210 has been successfully scheduled.
timeout (4 s) expired while waiting on socket fd 4
 
Your interactive job 210 has been successfully scheduled.
timeout (3 s) expired while waiting on socket fd 4
 
Your interactive job 210 has been successfully scheduled.
timeout (5 s) expired while waiting on socket fd 4
 
Your interactive job 210 has been successfully scheduled.
timeout (5 s) expired while waiting on socket fd 4
 
Your interactive job 210 has been successfully scheduled.
timeout (3 s) expired while waiting on socket fd 4
 
Your interactive job 210 has been successfully scheduled.
timeout (3 s) expired while waiting on socket fd 4
 
Your interactive job 210 has been successfully scheduled.
timeout (5 s) expired while waiting on socket fd 4
 
Your interactive job 210 has been successfully scheduled.
 
Your interactive job 210 has been successfully scheduled.
Establishing /opt/sge/bin/lx24-x86/sshscript session to host bw530n07.inf.ed.ac.uk ...
ssh -p 32795 bw530n07.inf.ed.ac.uk
[bw530n07]iainr:

qmon

In addition to the command line interface there is an X based gui called qmon which can be run from the head nodes. Qmon is used both for administration and submitting jobs and a certain amount of admin delegation is possible I'm currently experimenting with this and will publish what additional rights standard users have in this section.


Home : Systems : Beowulf 

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk
Please contact our webadmin with any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh