|Deletions are marked like this.||Additions are marked like this.|
|Line 210:||Line 210:|
| * Submitting [[Mace01AppFluent | Fluent jobs to the batch system]] SGE
* Submitting [[Mace01AppStarCD | Star-CD jobs to the batch sys]]
| * Submitting [[Mace01AppFluent | Fluent jobs to the batch system]]
* Submitting [[Mace01AppStarCD | Star-CD jobs to the batch system]]
* Submitting [[Mace01AppStarCCM | Star-CCM jobs to the batch system]]
Mace01 is a HPC cluster owned by the School of MACE; the system runs Scientific Linux v4 (a clone of RedHat Enterprise Linux). The primary role of this system is to run batch jobs in the traditional way; a limited number of jobs can also be run interactively, i.e., using an application's GUI. Computational jobs can also be run on the system via Condor (which is suitable for large numbers of small, short jobs).
If further help with Mace01 is required, users should email email@example.com: stating on the Subject: line that the issue relates to Mace01; giving their username; and, of course, describing the problem itself!
What is the system suitable for?
Serial jobs (including Matlab and Mathematica jobs) and small parallel jobs — MPI-based jobs up to perhaps 8 or 16 processes — scalability depends on the application being run; parameter sweeps (using Condor); long-running jobs (e.g., up to a month, or longer by arrangement).
What is the system not suitable for?
The system is not suitable for large-scale parallel computations.
Currently-available applications include:
- GNU compilers, including gfortran (F90/F95 compiler)
- Intel Fortran
- Absoft Fortran
The system comprises:
- about 100 compute nodes, each with two 2.8GHz Xeon CPUs and 4GB RAM;
- a login node with a four 3GHz dual-core Xeon CPUs and 4GB RAM;
- about 2TB of attached diskspace which is NFS-mounted on all compute nodes.
Getting Access to Mace01
University of Manchester staff and postgraduate students may apply for access to Mace01 by emailing firstname.lastname@example.org.
Logging into Mace01; File Transfer; X-Windows
There are four things to consider when using Mace01:
- Making contact with the machine — VPN/Firewall issues.
- Getting an SSH client so that you can login.
- Transfering files to and from the system by using SCP or SFTP.
- Enabling the use of GUI-based applications which run on Mace01 by using an X-Windows server on your desktop/laptop (e.g., if you are a MS Windows user, Xming or eXceed).
Each is outlined below. For those new to SSH/SCP/SFTP and X-Windows (aka X11), more details may be found in Introductory Notes for New Users of RCS HPC Systems
Not all computers can connect to Mace01:
- If you have the University VPN installed and running on your desktop/laptop, you will be able to connect.
- If you have a machine with a School of MACE IP address, you will be able to connect.
- Otherwise, by default, you will not be able to connect to the system, so contact the system administrator to ensure that your computer is given access.
Login Using SSH
Once network issues have been sorted out, as described above, you should be able to login using SSH &mdash remember to start the VPN if required.
Linux users will be able to login using OpenSSH, which comes with all popular distros, by typing
ssh -l <username> mace01.mace.manchester.ac.uk # ...replace <username> with your username, cf. mpciish2...
at the command-line.
MS Windows users should download and install PuTTY, an SSH client.
It is likely that you will wish to upload files to Mace01, or download them from Mace01 to your desktop/laptop. Linux users can do this by using either scp or sftp, from the OpenSSH utilities suite (which comes will all popular distros), for example:
scp myfile.txt <username>@mace01.mace.manchester.ac.uk: # ...to copy a file to Man2 --- don't forget the ":" at the end... scp <username>@mace01.mace.manchester.ac.uk:results.out results.copy # ...copy a file from Man2...
MS Windows users should download and install WinSCP, a GUI-based file-transfer client.
Using X-Windows: GUI-Based Applications
Using SSH on its own will enable you to login to Mace01 and use the command-line. If you want to use GUI-based applications, such as gedit, a Notepad-like editor, or you want to use Matlab interactively (i.e., use the GUI), then you need to run an X11 server on your local desktop/laptop machine and enable X11-tunnelling in your SSH connection.
All popular Linux distros run an X11-based desktop (e.g., GNOME, KDE) so the only remaining step is to enable X11-tunnelling when logging in:
ssh -X -l <username> mace01.mace.manchester.ac.uk # ...that's an UPPERcase X...
MS Windows users will need to download and install an X11 server. The two obvious options are eXceed, for which the University has a site licence, and Xming, which is free to download and install. When connecting:
- Start the VPN, if necessary.
- Start eXceed or Xming, then
- start PuTTY, being careful to enable X11 tunnelling — click on "SSH" on the left-hand-side, then ensure the X11 tunnelling box is "checked", before starting a connection to Mace01.
Where Everything Is: Filesystems and Backups
Currently (2009, December) there are no backups of the system. It is hoped that backups of home-directories will be provided by IT Services in the near future. In the meantime, home-directories are mirrored occasionally to another system — approximately once a week.
Applications, Compilers and Libraries
All applications are installed under /software
The commercial compilers (Intel and Absoft) are installed in their default location, under /opt.
Libraries built locally, including MPI libraries, are installed under /usr/local.
Users' home-directories are in /home-fs1-b1, /home-fs1-g1, /home-fs2-b1 and /home-fs2-g1. (Several further disks exist on the fileservers, but these will not be used to provide more home-directory space as the available capacity in IT Services' backup system will be limited even when the new hardware is in production.)
A total of about 2.5 TB of scratch space is available within /scratch-11. . ./scratch-62. This is available to all users.
Running Computational Jobs on Mace01
All CPU-intensive jobs (e.g., those lasting more than a few minutes) should be run on the compute nodes, not the login node. This is done by submitting jobs to the batch/queue system, SGE. Any jobs found running on the login node (lasting more than a few minutes) will be killed without warning.
Submitting jobs to the batch/queue system on Mace01 is straightforward. Those not familiar with batch systems, and in particular with SGE, should read the material available via at least the first of these links:
Information specific to Mace01 is given below.
Setting Up Your Environment
To make use of SGE on Mace01 you will need to have a few environment variables set, such as your PATH; unless you have changed your environment these will be set for you. If you find that you are unable to access SGE commands for whatever reason entering
should fix the problem.
Available Queues and Parallel Environments
Several queues are available on Mace01:
For parallel work, for example Fluent or Star-CD jobs, and for codes linked to the OpenMPI libraries, one of parallel-R2.q, parallel-R4.q and parallel-R5.q should be used. (The separation of available slots for parallel work into three distinct queues reflects the separation of compute nodes into different racks and different network switches.)
For serial work there are two queues, serial.q and serial-nonmace.q: users from the School of MACE should use the former queue; other users must use the latter (which is of a lower priority).
Experimental Interactive Queues for GUI-Based Work
Occasionally, particularly for users new to the system, GUI-based interactive work may be desirable (for example with Fluent or Matlab). This can be done within the batch system — see examples below.
Example Batch Job Details
High Level Programming Languages:
Computational Fluid Dynamics:
Submitting Fluent jobs to the batch system
Submitting Star-CD jobs to the batch system
Submitting Star-CCM jobs to the batch system
Example Interactive Job Details
Running Fluent interactively under SGE
Using the Fortran Compilers on Mace01
The MACE Condor Pool
Mace01 uses SGE as its primary job control system; users wishing to submit Fluent and MPICH jobs, for example, should do so in the traditional way, using this system. However, Condor is also available: this is used to dynamically backfill space between SGE jobs and also scavenge CPU cycles from a desktop machines in the George Begg teaching cluster which are also members of the pool. Users who wish to submit a large number of relatively small, short jobs (e.g. from a parameter sweep) should consider using Condor rather than SGE.
Open MPI is installed on the system; Open MPI is an implementation of the Message Passing Interface — v2. Details, including information about compilation and also submission of jobs to the batch system (SGE) can be found on the dedicated page.
N.B. OpenMPI should not be confused with OpenMP which is a shared-memory API, not an implementation of MPI.
RCS administer other computational facilities and can help users gain access to more. Please visit the main page for details.