Introductory Notes for New Users of HPC Systems
- Introductory Notes for New Users of HPC Systems
This page is intended for users new to HPC/HTC systems. It describes:
the steps required to access HPC systems — in particular those steps required from a MS Windows desktop/laptop;
- the nature of these HPC systems;
- sharing resources with other users and running computational job on them;
- some experimental services which are new to these HPC systems.
CSF Training Materials on getting started on the CSF
All IT Services for Research HPC systems are used remotely via SSH. Users authenticate (i.e., login) using an SSH client; after successful authentication a command-line interface is presented. This can be used to submit computational jobs to the batch system queues.
The remainder of this section may be considered the short version of this document — for those familiar with remotely accessing Linux-based HPC systems and submitting jobs to batch systems. For those that are not, please read the remaining sections!
Getting an Account and Authentication
Email firstname.lastname@example.org briefly describing your computational requirements.
- All systems are accessed via SSH, SCP and/or SFTP.
- All systems are firewalled. Some systems are accessible from all University of Manchester IP addresses; others are not. Few are accessible from outside of the University of Manchester.
Using GUI-Based Applications
- SSH, on its own, gives a command-line interface only. Should the use of GUI-based applications be required, for example the Notepad/Wordpad-like editor Gedit, or the Matlab graphical shell, then X-Windows (X11) may be tunnelled through the SSH connection (and an X-server will be required on the local desktop/laptop).
The Nature of the Systems
All RCS-administered HPC systems are Linux clusters of many computers, usually called nodes. In most cases these clusters exist on a completely private network; users directly access only one or two login/head nodes.
Running a Computational Job in the Batch System
- Many people are likely to be using each cluster simultaneously;
all computational jobs must be run run on compute nodes, not on the login/head node(s). Computational work is submitted to these compute nodes via the batch system queues.
Running Interactive Computational Jobs
- The vast majority of computational work carried out on RCS HPC systems is done in batch mode, i.e., non-interactively. On rare occasion it is necessary to run jobs interactively. Experimental queues exist on two RCS systems, Man2 and Mace01, which facilitate this.
- Running GUI-based, interactive computations presents a problem: if the local desktop or laptop which on which the GUI is displayed is switched off, or looses network connectivity, the computation will be killed even though it
is running on the remote HPC system. Using a virtual desktop to display the GUI eliminates this problem.
Troubleshooting and FAQ
Getting an Account; Authentication
Getting an Account
To get an account on any RCS-administered HPC system, email email@example.com, briefly describing the computational work that you wish to carry out, for example:
- applications, compilers or libraries needed;
- a rough estimate of diskspace required;
- the nature of the computational jobs you hope to run, for example:
- Do you wish to run a few long-running jobs, or a lot of short jobs?
- Does your work require a particularly large amount of memory?
- Is your code serial or parallel? (i.e., can it use more than one CPU at once?)
Getting Your Username and Password
For each HPC system run by IT Services for Research, you will have a username and password to enable you to authenticate (login) and run computational jobs. These credentials are independent of your central IT Services username and password, though, simply for ease of administration, the username will usually be the same.
Once you have an account on an HPC system, the system-administrator will contact you to give you your username and password.
For security reasons, as soon as you have received your credentials for a system, you should login and change your password (using the passwd or yppasswd commands).
Connecting to Linux-Based HPC Systems
Secure Shell (SSH)
Secure Shell (SSH) is a network protocol which is used to connect to remote computers, i.e., to authenticate (login) and interact with the remote system.
Macintosh OS-X systems and all popular Linux distributions include an SSH client called OpenSSH; MS Windows users must download and install one. The most popular is PuTTY which can be freely downloaded and installed.
Using OpenSSH on Linux and OS-X
At a command line, on Linux or OS-X, simply type
The first time you connect to a particular system you will be prompted to confirm its authenticity, for example
The authenticity of host 'man2.nw-grid.ac.uk (188.8.131.52)' can't be established. RSA key fingerprint is cf:48:69:ff:99:f0:a1:4a:80:0b:46:b5:40:c0:fc:4c. Are you sure you want to continue connecting (yes/no)?
Unless you have any reason for doubt, enter yes and you will then be prompted for your password — enter that given to you by the system's administrator (not your central IT Services password).
From a MS Windows desktop/laptop, to authenticate (login) to a remote Linux system, install and start PuTTY
Enter the name of the system to which you wish to connect and click Open.
and enter the name of the system to which you wish to connect in the Host Name box — in the above case man2.nw-grid.ac.uk; then click Open.
The first time you connect to any given system you will see a PuTTY aecurity alert
PuTTY Security Alert
The first time you connect to a system you will see a PuTTY security alert.
It should be safe to click Yes. (If this alert appears again, for a particular system, it may be a good idea to email the system administrator.)
The next step is authentication. Enter your username at the prompt — this will usually be your central IT Services username:
PuTTY Login Prompt
Enter your username at the prompt and then the password given to your for this system, when asked.
Then, when prompted, enter the password given to you by RCS.
GUI-Based Applications and X-Windows
Using PuTTY alone allows you to login and enter commands, for example, submit computational jobs to the batch system. But what if you want to start a GUI-based editor, such as gedit, or start the Matlab GUI? Then you will need to be running an X11 Server on your local desktop/laptop and also to connect using PuTTY with X11 tunnelling enabled.
Macintosh OS-X systems and all popular Linux distributions include an X11 server — that on Linux is always running (assuming you are running a GUI-based desktop such as GNOME or KDE). The only remaining step is to enable X11-tunnelling when logging in:
ssh -X -l <username> csf.itservices.manchester.ac.uk
- # ...that's an UPPERcase X...
MS Windows users must download and install one. The most popular are Hummingbird eXceed and Xming; the University has a site licence for eXceed; Xming may be freely downloaded and installed.
PuTTY: Enable X11 Forwarding
Ensure the Enable X11 Forwarding box is checked.
Once you have an X11 server installed, then in order:
Start the X11 server — eXceed or Xming.
Start PuTTY — ensure the Enable X11 forwarding box is "checked" (see figure).
- Login as to the remote Linux system as normal. You should then be able to start GUI-based applications such as gedit and Matlab on the remote system and have them displayed on your local desktop/laptop.
It is likely that you will wish to upload files to the HPC system, or download them to your desktop/laptop. Linux users can do this by using the OpenSSH utilities suite (which comes will all popular distros) or SSHFS. MS Windows users must download a suitable client; WinSCP, which is freely ownloadable, is a popular choice.
Using SCP and SFTP
At a command line, on Linux or OS-X, to upload a file from your desktop/laptop, simply type
scp <local.filename> <username>@<remote.system.name>:<remote.filename>
scp my_prog.f90 firstname.lastname@example.org:my_programme.f90
To download a file to your desktop/laptop, enter, for example,
scp email@example.com:my_results.dat my_remote_results.dat
Linux users can use SSHFS to mount any filesystem which is accessible via SSH on their desktop PC. It is a userspace filesystem based on FUSE. Read/Write access is the same as after an SSH login. You need to identify and install the relevant package for your linux distribution using your preferred install tool/method. Typically, for debian or ubuntu users the required package is usually called sshfs and for fedora users fuse-sshfs
desktop> mkdir ~/remote.system.name desktop> sshfs firstname.lastname@example.org: ~/redqueen
To access files:
desktop> ls ~/remote.system.name benchmarking Qacct.pm~ test_openmpi_gcc44_gfortran44 bin Qstat.pm test_openmpi_gcc_gfortran CLUSTER Qstat.pm~ test_openmpi_gcc_gfortran_mx
fusermount -u ~/remote.system.name
To download or upload files, start WinSCP and enter the name of the system you which to upload/download files to/from in the Host name box, and your username and password.
Enter the name of the remote system, your username and password, and click Login.
Then click Login.
The first time you login to any given system you will see a warning message.
The first time you login to any given system you will see a warning message.
It should be safe to click Yes. (If this alert appears a second time, for any given system, it may be a good idea to email the system administrator.)
Once logged in, a nice drag-n-drop interface is presented.
WinSCP Drag-n-Drop Interface
Once logged in, a nice drag-n-drop interface is presented.
Network and Firewall Issues
All RCS-administered HPC systems are firewalled; the firewall policies vary and depend on the purpose of the system. The system-specific documentation should give details. If a system is not accessible from all University of Manchester IP addresses, users will be required to register addresses from which they plan to connect. Access may be possible using the University VPN — from both on and off campus.
The Nature of HPC Systems
Each HPC System ''is a Cluster''
Each HPC system is a cluster of nodes, on a private network. Only the login/master node is accessible on the public network and only this node is accessed by users. All the other nodes are compute nodes (which are directly accessed only by the system-administrator). <CENTER><IMG SRC="_images/hpc_cluster.gif" HEIGHT="400"></CENTER>
Each HPC System ''is Used by Many People''
Many people use each HPC cluster; the computational resources are shared between them.
Batch Systems and Queues
Computational work is submitted from the login/master node to the compute nodes by users via a batch system
Running Computational Jobs
The HPC systems are a shared computational resource. To ensure everyone gets a fair share and to allow the system to function correctly:
All computationally-intensive work must be submitted to the batch system's queues — computationally-intensive processes run on the login node will be killed without warning. (Low intensity work, such as editing, is of course perfectly accessible on the login node.)
- A consequence of this is that most computational work must be carried out in batch mode, i.e., non-interactively. For example, if using Matlab, the required computation must be done my asking Matlab to run a Matlab programme, rather than by starting the (interactive) application interface (e.g., GUI).
- Most RCS-run HPC systems use SGE (Sun Grid Engine) as the batch system:
- Some RCS-run HPC systems do not make use of SGE, notably Horace. Details of how to submit jobs to their respective batch systems should be found in the dedicated documentation.
Experimental Interactive/GUI Queues
Traditionally all computational jobs run on HPC clusters are batch jobs, i.e., once started, there is no interaction with the computation; no GUI is required or used. For example, Matlab code is run at the command-line interface (e.g., matlab < my_prog.m) rather than within the graphical shell.
However, in some cases use of an application GUI may be desirable or even necessary (e.g., with Matlab or Fluent). For this, interactive queues exist on man2.nw-grid.ac.uk and mace01.mace.manchester.ac.uk which enable users to queue interactive, GUI-based sessions.
These interactive queues are experimental. Please contact the system administrator of the Man2 and/or Mace01 before using them.
Experimental Virtual Desktop Services
It may be that queued, GUI-based sessions take some hours, and that during that time a user wishes to change location (e.g., move from office to home) and/or the computer being used (e.g., from office desktop to home laptop).
Shutting down (or suspending) a desktop or laptop on which a remotely-running application GUI is displayed will usually force the application to exit (when the connection timeout is exceeded) killing the job half-way through); and of course a user can no longer interact with an application displayed on a desktop from a different location!
VNC solves these problems via its virtual desktop. Applications run displayed on the virtual desktop whether or not this virtual desktop is it self currently being displayed. This means that a user can start an application on a virtual desktop, then disconnect and reconnect as required, while the application continues to run untroubled.
An experimental virtual desktop (VNC) services is being trialled on mace01.mace.manchester.ac.uk.