Computational Science Community Wiki

OpenMPI on Mace01

Currently two implementations of OpenMPI are installed:

Users are strongly recommended to use v1.3, which is the only version documented here.

Prerequisites

OpenMPI uses SSH to start MPI-related processes; you will need to ensure you have promptless, passwordless SSH access across the Mace01 cluster to run OpenMPI jobs. This is done using an SSH key and a known_hosts file. New users will have this set up for them; established users who require some help setting this up should email the system administrator.

N.B. OpenMPI can make use of the dedicated MPI network on Rack 4 nodes (and indeed use multiple networks simultaneously to send messages between compute nodes): you will need ensure your known_hosts file contains references to both IP addresses of these nodes to make use of this (i.e., both 10.10.1.0/24 and 10.11.12.0/24 addresses).

Compiling and Submitting a Job

Qsub scripts based on the example below can be used:

  #!/bin/bash

  #$ -pe orte.pe 8
  #$ -q parallel-R4.q

  #$ -cwd
  #$ -S /bin/bash

  export LD_LIBRARY_PATH=/opt/intel/fce/10.1.012/lib

  /usr/local/openmpi-1.3--ifort-v10--gcc-v3/bin/mpirun -n $NSLOTS mynameis.ifort

where mynameis.ifort is the binary executable to be run. The environment variable $NSLOTS is set by SGE and takes its value from the number of processes specified in the parallel environment line in the script, in this case, 8.

OpenMPI jobs can be submitted to any of the parallel queues on Mace01, currently (2009/April)

  parallel-R2.q
  parallel-R4.q
  parallel-R5.q

but parallel-R4.q is the recommended queue, as Rack 4 has a dedicated MPI network in addition to the general-purpose network. (Other racks have only one network.)

Tips and Tweaks

OpenMPI will automatically detect multiple networks and use them, if it can. (For details, see the OpenMPI FAQ.) There are two networks connecting compute nodes in Rack 4 in Mace01:

Excluding the NFS Interface

Should you wish to try excluding the NFS interface from your job run, add the following to the mpirun command in your qsub script:

  --mca btl_tcp_if_exclude lo,eth0
        # ...the local interface, lo, is the default;  you must always include this in your
        #    exclusion list...

Debug Info

It may be useful to see what network interfaces OpenMPI is attempting to use. Adding the following to the mpirun command in your qsub script is sufficient:

  --mca btl_base_verbose 30

for example

   mpirun --mca btl_base_verbose 30 -n 16 --machinefile hostfile openmpi-examples/ring_f90

includes the following output:

  [R4-05:20768] btl: tcp: attempting to connect() to address 10.10.1.26 on port 4782
  [R4-06:13379] btl: tcp: attempting to connect() to address 10.10.1.33 on port 63151
  [R4-05:20768] btl: tcp: attempting to connect() to address 10.11.12.26 on port 4782
  [R4-06:13379] btl: tcp: attempting to connect() to address 10.11.12.33 on port 63151

indicating that both networks are used.