CompBioCluster

This page describes the computer cluster used for computational biology and other purposes, commonly known as van-halen. All information regarding the cluster should be put here so that there isn't a single point of user failure. A list of users for the cluster is at CompBioCluster/TrustedUserGroup. To edit these pages, please create an account on the wiki and contact RonanDaly or SimonRogers with your username; this will be added to the CompBioCluster/TrustedUserGroup and you can start editing.

Mailing List

There is a mailing list for the group at [MAILTO] compbiocluster@dcs.gla.ac.uk. The interface to manage this is at [WWW] https://mr1.dcs.gla.ac.uk/mailman/admin/compbiocluster.

Cluster Hardware

The cluster is located in the server room in Lilybank Gardens. The hardware currently consists of a [WWW] Netshelter VL Value Line - 42U 600X1070MM rack, an [WWW] HP V1810G 48 Port Ethernet Switch (model number J9660A) and 7 [WWW] Intel SR1690WBR Server Systems. One of the servers is designated as a head node, which has a DVD-ROM built in (van-halen) and the rest are compute nodes. Each node has 16GB of RAM and 8 cores, giving roughly 2GB of RAM per process on average.

The cluster is set up so that the connections between the head node and the compute nodes are on a private network, with the head node also connected to the DCS network. For this purpose, the switch has had two VLANs set up, public(1) and private(100). The ports on the switch are untagged, so traffic cannot span the two VLANs. The head node takes care of routing all traffic between the compute nodes and the outside world.

Cluster Setup

On the Intel compute server systems (not the head node), the CMOS settings have been changed to

Cluster OS

The cluster is running [WWW] Rocks 6.0, which is based on [WWW] CentOS 6.2. This distribution is specifically designed for clusters and has the ability to automatically install and update OS images on each of the compute nodes. Details of configuration and administration are in CompBioCluster/ClusterAdministration

Accessing the cluster

The cluster head node is at van-halen.dcs.gla.ac.uk. Each of the compute nodes is at compute-0-{0..n-1}, where n is the number of compute nodes. The cluster can only be accessed using ssh. In particular, the only node that can be accessed is the head node (van-halen), and the connection must be from inside the university network. Files can be copied over to the head node using scp and it is also possible to copy files inward (from a computer on the university network to the current node you are on).

Home directories are at /home/<username>. In particular, the home directories reside on the head node and are NFS mounted to /home on the compute nodes. This means that it is possible to access files from your home directory on any node. However, there is only roughly 460GB available on /home and there are no quotas in place. This means that you must take care with the amount of files you place in home so as to stop it getting too full and blocking other users from using the cluster. When you are finished using data, please delete it from your directory.

If you need more space when running a job or are running jobs that access files a lot, there are also partitions on each of the compute nodes that may be used. These are mounted on /state/partition1. Files can be copied here ahead of processing, or the results of processing can also be stored here. However, it is not possible to access these partitions from outside the node, so when a job is finished it should copy any results out. Files on these partitions should also be considered continually at risk and may disappear at any time. These partitions are also roughly 460GB in size, so again please be careful what you are placing on them. It is worthwhile knowing that NFS can be quite slow, so if you are accessing files on /home a lot, your jobs can slow right down because of network latency.

Backups

There are currently no backups of the cluster, so your data may disappear at any time. Please copy important information somewhere else.

Installing Software

Software can of course be installed in a user's home directory and run from there. However, some systems don't make it easy for users to install there. Some hints are at CompBioCluster/InstallingPythonSoftware.

Running jobs on the cluster

[WWW] Sun Grid Engine (now known as Oracle Grid Engine) version 6.2u5, is used as the job scheduling tool. These tools must be used to run jobs on the cluster; in particular, do not run jobs as standalone processes, either on the head node or on a compute node. This will ensure that cluster resources are shared out fairly and evenly and that a high throughput is maintained. There are more details on running jobs at CompBioCluster/RunningJobs. For a management perspective, have a look at CompBioCluster/ManagingGridEngine.

last edited 2013-08-21 12:48:06 by RonanDaly