访问量: 117 次浏览
There are already many existing Linux based clustering solutions out there that claim to provide an easy way to obtain/build a Linux Beowulf Cluster[1] . The fact is that most links out there are either dead or completely outdated. We'll concentrate on a more specific class that use a diskless node approach where the nodes boot off a Single System Image through the network interface (this process is explained in the Creating the SSI section below). The Clustermonkey web site[2] has an article[3] which alleviates the use of such a configuration in specific conditions depending on the intended use of the cluster. One of the key conditions where diskless nodes are useful is when there is a need to share file based data during the runs between the nodes. However, if all processes compute and manipulate the data independently, local storage becomes more interesting. In our case, the nodes do have a local disk that we could configure as a local "scratch pad" for such purposes.
With the existence of commercial and non-commercial solutions (refer to the References at the end of the article), one must ponder as to why we would want to build our own cluster solution from bottom up. We'll provide a few of the key answers to this here.
Gentoo is becoming more and more popular due to it's flexibility and managebility. Recent IEEE journal articles have been written on this subject so we won't debate this here[5] . Since hardware and software, in the Linux and research world, evolve at a frantic rate, the need for a fast evolving OS is more than necessary. Gentoo offers this technological bleeding edge as well as providing the means to easily integrating new packages to the system by the means of portage's ebuilds.
In our proof of concept, we will be using the following material to build our mini-cluster:
We will use the most basic/common network topology for building this cluster. All Slave Nodes connect to one switch which in turn connect to the Master Node through a 100BaseT Ethernet network. The Master Node has two Ethernet devices to ensure that the Slave Nodes are on an isolated network. In theory, the nodes should not be accessed directly by the users and jobs are to be launched through the Master Node.
The steps to creating a Gentoo base Single System Image is documented in the Gentoo Diskless Client section of this wiki. Although a little bit general, it contains the key elements to creating an SSI image that will be usable by the Gentoo Headnode Configuration document. However, there are some applications we do implicitly add to nodes. Here is a short listing of some of the Gentoo packages:
sys-cluster/openmpi sys-cluster/torque sys-cluster/ganglia
Ganglia is used for monitoring the entire cluster. It's installation is detailed in the Gentoo Headnode Configuration document document. Note that the openmpi and torque ebuilds come from the www.gentooscience.org overlay.
This section is detailed by the Gentoo Headnode Configuration Document. Please refer to it for details on how to configure the Master Node.
We use SSH to launch commands on each nodes as an alternative to RSH. There are many arguments to using RSH instead of the overhead of SSH. The fact is that SSH is more portable when it comes to carrying the environment over to the other nodes than RSH. Since SSH is only used for launching commands an not for the actual communications, there is no overhead added to the actual computation.
Since your home directory is mounted across all nodes, you only need to create one key in your home directory and it will automatically be present on all nodes due to the NFS mounted nature of your $HOME. Here is the sequence to perform:
cd ~/.ssh/ ssh-keygen -t dsa -b 1024 -f id_dsa
The ssh-keygen command will prompt for a passphrase, don't enter anything since we don't want one to log onto the nodes. We then add the newly generated key to the authorized_keys:
cat id_dsa.pub >> authorized_keys
Now we must log onto all the nodes so that their unique signature is added to our ssh configuration. To make the process simpler, we can loop the process as such:
for Num in $(seq 1 24); do ssh thinkbig${Num} hostname; done
This will log you onto each nodes and get the hostname value (we use hostname so that ssh is only used to launch a simple command and doesn't actually open a session on the node). Here is an example output, note that some of the nodes aren't available (ssh: thinkbig20: Name or service not known) and some of them were already registered (they simply return their hostname):
eric@headless ~ $ for Num in $(seq 1 24); do ssh thinkbig${Num} hostname; done thinkbig1
The authenticity of host 'thinkbig2 (10.0.1.12)' can't be established. RSA key fingerprint is
22:c1:2a:28:44:f2:1d:a6:7e:57:72:16:ee:d5:28:4c. Are you sure you want to
continue connecting (yes/no)? yes
Warning: Permanently added 'thinkbig2,10.0.1.12' (RSA) to the list of known hosts.
thinkbig2
The authenticity of host 'thinkbig3 (10.0.1.13)' can't be established. RSA key
fingerprint is 22:c1:2a:28:44:f2:1d:a6:7e:57:72:16:ee:d5:28:4c. Are you sure
you want to continue connecting (yes/no)? yes
Warning: Permanently added 'thinkbig3,10.0.1.13' (RSA) to the list of known hosts.
thinkbig3
ssh: thinkbig4: Name or service not known
The authenticity of host 'thinkbig5 (10.0.1.15)' can't be established. RSA key fingerprint is
22:c1:2a:28:44:f2:1d:a6:7e:57:72:16:ee:d5:28:4c. Are you sure you want to
continue connecting (yes/no)? yes
Warning: Permanently added'thinkbig5,10.0.1.15' (RSA) to the list of known hosts.
thinkbig5
ssh:thinkbig6: Name or service not known
thinkbig7
ssh: thinkbig8: Name or service not known
thinkbig9
thinkbig10
ssh: thinkbig11: Name or service not known
thinkbig12
thinkbig13
ssh: thinkbig14: Name or service not known
thinkbig15
thinkbig16
thinkbig17
thinkbig18
thinkbig19
ssh: thinkbig20: Name or service not known
thinkbig21
ssh: thinkbig22: Name or service not known
thinkbig23
thinkbig24
The loop described above can also be used to generate a list of available nodes as such:
for Num in $(seq 1 24); do ssh thinkbig${Num} hostname 2> /dev/null | grep -e '^think' >> hostfile ; done
Only run this after having added the hosts to your known_hosts as performed by the loop above. The file named hostfile now contains:
eric@headless ~ $ cat hostfile thinkbig1 thinkbig2 thinkbig3 thinkbig5 thinkbig7 thinkbig9 thinkbig10 thinkbig12 thinkbig13 thinkbig15 thinkbig16 thinkbig17 thinkbig18 thinkbig19 thinkbig21 thinkbig23 thinkbig24
kyron@headless ~ $ export LD_LIBRARY_PATH=/usr/lib/openmpi/1.0.2-gcc-4.1/lib64; /usr/lib/openmpi/1.0.2-gcc-4.1/bin/mpirun -np 2 hello Hello, world. I am 1 of 2 Hello, world. I am 0 of 2
Execution of OpenMPI on the 32 bit nodes including the 64 bit head node... This is a heterogeneous issue.
Here, in no particular order: