CompBioCluster/ClusterAdministration

This page describes configuration to the ROCKS OS used in the cluster and how to perform certain cluster administration tasks.

ROCKS guides

The ROCKS user guide can be found at [WWW] http://www.rocksclusters.org/roll-documentation/base/5.5/. There are also guides for various add on packages (known as "rolls") at [WWW] http://www.rocksclusters.org/roll-documentation/; however, these tend not to be very well documented.

Adding OS packages

ROCKS only installs a limited number of OS packages by default. In order to be able to install other packages, we downloaded all the OS rolls from 3 to 9 from the [WWW] rocks website. We then added the extra packages to the repository

# rocks add rolls os-6.0-0.x86_64.disk*.iso
# cd /export/rocks/install/
# rocks create distro

This added the extra OS packages to the internal repository. In order to install packages on the head and compute nodes, we

# cd /export/rocks/install/site-profiles/6.0/nodes
# cp skeleton.xml extend-compute.xml

and then edit extend-compute.xml to include package statements with the packages wanted. We then run

# cd /export/rocks/install/
# rocks create distro

To install the new packages on a node, we need to reboot it with a PXE install. We can do this by

ssh compute-0-0 /boot/kickstart/cluster-kickstart

where compute-0-0 is the node we want to reinstall. In order to install on the head node, we simply

yum install <package>

PXE booting change

There is an issue with the Intel servers, in that if they try and PXE boot and they are then told by the server to boot from the hard drive, they will hang. To fix this, the actions at [WWW] https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2012-May/058011.html were used, i.e.

# cp /usr/share/syslinux/chain.c32 /tftpboot/pxelinux/
# rocks add bootaction action=os args="hd0" kernel="com32 chain.c32"

Adding users

Adding users is done using by running adduser or useradd on the head node as usual. However, these users then need to be replicated to each of the compute nodes. This is done by

# rocks sync users

Monitoring the cluster

A [WWW] Ganglia daemon is running on the cluster, which monitors each of the nodes. The cluster is set up to not allow incoming requests on port 80, but if you forward your X over ssh, you can run a browser on the head node and point it at localhost to view the Ganglia output.