This page describes configuration to the ROCKS OS used in the cluster and how to perform certain cluster administration tasks.

ROCKS guides

The ROCKS user guide can be found at [WWW] There are also guides for various add on packages (known as "rolls") at [WWW]; however, these tend not to be very well documented.

Adding OS packages

ROCKS only installs a limited number of OS packages by default. In order to be able to install other packages, we downloaded all the OS rolls from 3 to 9 from the [WWW] rocks website. We then added the extra packages to the repository

# rocks add rolls os-6.0-0.x86_64.disk*.iso
# cd /export/rocks/install/
# rocks create distro

This added the extra OS packages to the internal repository. In order to install packages on the head and compute nodes, we

# cd /export/rocks/install/site-profiles/6.0/nodes
# cp skeleton.xml extend-compute.xml

and then edit extend-compute.xml to include package statements with the packages wanted. We then run

# cd /export/rocks/install/
# rocks create distro

To install the new packages on a node, we need to reboot it with a PXE install. We can do this by

ssh compute-0-0 /boot/kickstart/cluster-kickstart

where compute-0-0 is the node we want to reinstall. In order to install on the head node, we simply

yum install <package>

Adding other packages

The best way to add other packages is to add RPMs to the configuration, such that they are automatically used on a node reinstall. These RPMs can be downloaded manually or can be picked up from a yum install. One good source of packages is the EPEL repository, which is designed around Enterprise Linux, which ROCKS eventually derives from. Packages from this repository are less likely to conflict with other packages from the OS.

The EPEL repository was enabled by editing the /etc/yum.repos.d/epel.repo file and changed enabled to 1. Then a

yum install <package>

will install the package on the head node, and leave the RPMs used at cd /var/cache/yum/epel/packages/. If we copy these packages

cp <RPMs> /export/rocks/install/contrib/6.0/x86_64/RPMS/

and edit the file /export/rocks/install/site-profiles/6.0/nodes/extend-compute.xml as above, after a

# cd /export/rocks/install/
# rocks create distro

the packages will be installed on a compute node after a reinstallation.

PXE booting change

There is an issue with the Intel servers, in that if they try and PXE boot and they are then told by the server to boot from the hard drive, they will hang. To fix this, the actions at [WWW] were used, i.e.

# cp /usr/share/syslinux/chain.c32 /tftpboot/pxelinux/
# rocks add bootaction action=os args="hd0" kernel="com32 chain.c32"

Adding users

Adding users is done using by running adduser or useradd on the head node as usual. However, these users then need to be replicated to each of the compute nodes. This is done by

# rocks sync users

Monitoring the cluster

A [WWW] Ganglia daemon is running on the cluster, which monitors each of the nodes. The cluster is set up to not allow incoming requests on port 80, but if you forward your X over ssh, you can run a browser on the head node and point it at localhost to view the Ganglia output.

Backing up the cluster

The Rocks way of backing up the cluster configuration is to create a Restore roll that can be used at cluster reinstallation. To do this, edit the file /export/site-roll/rocks/src/roll/restore/ to add in the files that need saving and then run

# /opt/gridengine/inst_sge -bup

to backup the grid engine configuration and then

# cd /export/site-roll/rocks/src/roll/restore
# make roll

This will leave a CD image at /export/site-roll/rocks/src/roll/restore/van-halen<stuff>.iso that can be saved, burned to a CD and used in the cluster reinstallation process.

last edited 2013-03-05 12:31:15 by RonanDaly