Computational Science Community Wiki

Notes regarding HECToR GPU testbed, for project ge199

Note this page is world-readable but only writable by those with HECToR GPU Testbed access; all generally relevant info should be moved eg to GPU/GpuFaq


HECToR GPU Testbed Pages


General Info

Issues

Projects

Notes on using HECToR GPU testbed

Running Jobs

Your jobscript should select the correct queue for the hardware you require. Details of the hardware can be found on the HECToR GPU Testbed page but basically there are three nodes containing two Nvidia GPUs each and one node containing two AMD Firestream GPUs. The nodes are selected by specifying the appropriate queue in your jobscript. The queues ensure your app runs on the correct node. The queues are named as follows:

Within a node you will be allocated one of the GPUs by the system. Hence you should not request specific GPUs using calls such as cudaSetDevice().

A typical jobscript would be something like:

You may wish to load specific modules in your jobscript if they are not present in the environment. For example, for the firestream@gpu4 queue, using the AMD GPUs to run OpenCL:

Note that for AMD GPUs we have found that the DISPLAY variable must be unset when the app runs. This can be done by either turning off X11 forwarding when ssh-ing in or by unsetting the variable in the jobscript.

If you wish to try OpenCL on the CPU then the AMD drivers on gpu4 will support this. You must write your code to select a CL_DEVICE_TYPE_CPU device. The ability to write OpenCL code that runs on the CPU is useful if no GPUs are available. However, it is unlikely you would choose the CPU over the GPU on the HECToR testbed system.

Fortran and CUDA

to move to FAQ

CUDA Notes

CUDA can only be run on the Nvidia GPUs, and will run CUDA v3 or v4. Your jobscript should request the queue for the particular node required as follows:

OpenCL Notes

OpenCL can be run on any of the nodes, whether they containing the Nvidia GPUs or the ATI/AMD GPUs. The Nvidia installations run OpenCL v1.0 while the AMD GPUs run OpenCL v1.1. The AMD installation of OpenCL can also use the CPUs (in the node containing the AMD GPUS) as valid OpenCL devices. Your jobscript should request the queue for the particular node required as follows:

If using the firestream@gpu4 node you should ensure the DISPLAY environment variable is not set in your environment when the job runs (i.e., unset in your jobscript). This can be done by turning off X11 forwarding when ssh-ing in to the system or by adding unset DISPLAY to your jobscript before running your application.

Background Papers

SpMv