This page describes configuration to the ROCKS OS used in the cluster and how to perform certain cluster administration tasks.
The ROCKS user guide can be found at http://www.rocksclusters.org/roll-documentation/base/5.5/. There are also guides for various add on packages (known as "rolls") at http://www.rocksclusters.org/roll-documentation/; however, these tend not to be very well documented.
Adding OS packages
ROCKS only installs a limited number of OS packages by default. In order to be able to install other packages, we downloaded all the OS rolls from 3 to 9 from the rocks website. We then added the extra packages to the repository
# rocks add rolls os-6.0-0.x86_64.disk*.iso # cd /export/rocks/install/ # rocks create distro
This added the extra OS packages to the internal repository. In order to install packages on the head and compute nodes, we
# cd /export/rocks/install/site-profiles/6.0/nodes # cp skeleton.xml extend-compute.xml
and then edit extend-compute.xml to include package statements with the packages wanted. We then run
# cd /export/rocks/install/ # rocks create distro
To install the new packages on a node, we need to reboot it with a PXE install. We can do this by
ssh compute-0-0 /boot/kickstart/cluster-kickstart
where compute-0-0 is the node we want to reinstall. In order to install on the head node, we simply
yum install <package>
PXE booting change
There is an issue with the Intel servers, in that if they try and PXE boot and they are then told by the server to boot from the hard drive, they will hang. To fix this, the actions at https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2012-May/058011.html were used, i.e.
# cp /usr/share/syslinux/chain.c32 /tftpboot/pxelinux/ # rocks add bootaction action=os args="hd0" kernel="com32 chain.c32"
Adding users is done using by running adduser or useradd on the head node as usual. However, these users then need to be replicated to each of the compute nodes. This is done by
# rocks sync users
Monitoring the cluster
A Ganglia daemon is running on the cluster, which monitors each of the nodes. The cluster is set up to not allow incoming requests on port 80, but if you forward your X over ssh, you can run a browser on the head node and point it at localhost to view the Ganglia output.