OpenMPI on Mace01
Currently two implementations of OpenMPI are installed:
- v1.3 compiled with GCC and the Intel Fortran 77/90/95 compiler, Ifort.
- v1.2.7 compiled with the same compilers.
Users are strongly recommended to use v1.3, which is the only version documented here.
OpenMPI uses SSH to start MPI-related processes; you will need to ensure you have promptless, passwordless SSH access across the Mace01 cluster to run OpenMPI jobs. This is done using an SSH key and a known_hosts file. New users will have this set up for them; established users who require some help setting this up should email the system administrator.
N.B. OpenMPI can make use of the dedicated MPI network on Rack 4 nodes (and indeed use multiple networks simultaneously to send messages between compute nodes): you will need ensure your known_hosts file contains references to both IP addresses of these nodes to make use of this (i.e., both 10.10.1.0/24 and 10.11.12.0/24 addresses).
Compiling and Submitting a Job
Qsub scripts based on the example below can be used:
#!/bin/bash #$ -pe orte.pe 8 #$ -q parallel-R4.q #$ -cwd #$ -S /bin/bash export LD_LIBRARY_PATH=/opt/intel/fce/10.1.012/lib /usr/local/openmpi-1.3--ifort-v10--gcc-v3/bin/mpirun -n $NSLOTS mynameis.ifort
where mynameis.ifort is the binary executable to be run. The environment variable $NSLOTS is set by SGE and takes its value from the number of processes specified in the parallel environment line in the script, in this case, 8.
OpenMPI jobs can be submitted to any of the parallel queues on Mace01, currently (2009/April)
parallel-R2.q parallel-R4.q parallel-R5.q
but parallel-R4.q is the recommended queue, as Rack 4 has a dedicated MPI network in addition to the general-purpose network. (Other racks have only one network.)
Tips and Tweaks
OpenMPI will automatically detect multiple networks and use them, if it can. (For details, see the OpenMPI FAQ.) There are two networks connecting compute nodes in Rack 4 in Mace01:
- eth0 is used for administration, NFS (i.e., data transfer) and also MPI;
- eth1 is dedicated to MPI traffic.
Excluding the NFS Interface
Should you wish to try excluding the NFS interface from your job run, add the following to the mpirun command in your qsub script:
--mca btl_tcp_if_exclude lo,eth0 # ...the local interface, lo, is the default; you must always include this in your # exclusion list...
It may be useful to see what network interfaces OpenMPI is attempting to use. Adding the following to the mpirun command in your qsub script is sufficient:
--mca btl_base_verbose 30
mpirun --mca btl_base_verbose 30 -n 16 --machinefile hostfile openmpi-examples/ring_f90
includes the following output:
[R4-05:20768] btl: tcp: attempting to connect() to address 10.10.1.26 on port 4782 [R4-06:13379] btl: tcp: attempting to connect() to address 10.10.1.33 on port 63151 [R4-05:20768] btl: tcp: attempting to connect() to address 10.11.12.26 on port 4782 [R4-06:13379] btl: tcp: attempting to connect() to address 10.11.12.33 on port 63151
indicating that both networks are used.