GPU Club: THURSDAY 17 May 2012
Software for GPUs inc. compiler/directives, maths libs & tools (debuggers and profilers)
11:00-12:30, Thurs 17 May
A7, Samuel Alexander
Over 20 people attended (+ several apologies). The following presentations were given and discussed:
After the presentations the following points were discussed:
On the presentations:
Health Policy Simulation
- How much of a computational challenge is there in the health model being simulated? It was reported that each member of the population was simulated independently. Hence this was very much High Throughput Computing. However, future developments will likely couple members of the population.
- How long did the OpenCL development take? It was reported that an initial port from F# took about 2 months but the remaining time was to carry out optimization.
- The OpenCL port was used for the CPU timings too (some SSE additions).
Was the largest simulation 1283? It was reported that this was currently the largest model on a single GPU due to memory requirements of the 4th order Runge-Kutta method (which duplicates a lot of memory). However developments will make this a multu-GPU application via MPI and so larger models can be solved.
- Was the CPU time reported the serial time? It was reported that it was. While this may seem unfair, the scalability of the CPU code was known and would be reported. But the serial CPU code was numerically stable.
- For debugging, they had used a single thread for checking the reduction results. Said to be a useful trick but one has to remove this serialisations from the production code
- Dynamic parallelism and Hyper-Q, as per Kepler + CUDA 5, may enable improved multi-GPU performance.
Pedro offered this talk having see the RAC Report on Sparse Linear Solvers on Multiple GPUs
- There didn't seem to be much difference between the single precision and double precision performance. Is this correct? This was something that was being looked in to.
Phil Couch, Health Sciences: "Accelerated Public Health Policy Simulation using GPUs"
The high throughput nature of GPUs and the recent development of general purpose APIs to program GPU devices has lead to their application in a number of scientific HPC domains. IMPACT2 is a project for implementing computer systems that allow the development of healthcare policy models and simulations based on these models to be executed and analysed in real-time. The system allows users to calibrate their models against data from observational studies using algorithms that often require the execution of many hundreds of simulations. This is a time consuming process on many conventional CPU based systems, resulting in a loss of interactivity and model exploration. Here we discuss our use of OpenCL to accelerate simulations through their execution on GPUs, presenting details of optimisations and performance.
Mike Griffiths, Sheffield: "MHD algorithm using GPUs"
Parallel MHD algorithms are important for numerical modelling of solar and astrophysical plasmas. Parallelisation techniques have been exploited most successfully by the gaming/graphics industry with the adoption of graphical processing units (GPUs) possessing hundreds of computational units. The success has been recognised by the computational science and engineering communities who have harnessed the computing power of GPUs. We describe the implementation of fully non linear magnetohydrodynamic 1-3D (MHD) codes called SMAUG (Sheffield MHD algorithm using GPU’s). SMAUG may even be applied to gravitationally stratified media often applicable to astrophysical plasmas.
The objective of this presentation is to describe the numerical methods used and the techniques for porting the code to this novel and highly parallel compute architecture. The methods employed are justified by the presentation of validation results and performance benchmarks. We describe the implementation of 1-3D (MHD) codes for gravitationally stratified media on graphical processing units and highly parallel compute architectures. We reveal the validity of our code by demonstrating agreement with results for stratified solar flux tube models for the quiet sun.
Pedro Valero Lara, CIEMAT (Madrid): "Block Tridiagonal Solver using CUDA"
Modern multi-core and many-core systems offer a very impressive cost/performance ratio. We will present a set of new parallel implementations for the solution of linear systems with block-tridiagonal coefﬁcient matrix on current parallel architectures, We shall discuss implementations on multi-core, many-core and a heterogeneous implementation on both architectures. The results show a speedup higher than 6 on certain parts of the problem, with the heterogeneous implementation being the fastest.
Dr Philip Couch is a software engineer in Health Sciences at the University of Manchester. He has a chemical physics and informatics background and has gained a BSc and PhD in Chemistry from the University of Nottingham. Philip worked in Nottingham as a research associate in astronomy before moving to the Science and Technology Facilities Council, where he worked on the development of Semantic Web standards and tools for scientists. Philip currently works as part of the Northwest Institute for BioHealth Informatics, engineering computer systems to allow collaborative development and use of healthcare policy models.
GPU Club Meetings
Weds 13 Nov: 2pm, Univ Place. John Michalakes (NOAA) and Craig Davies (Maxeler Dataflow)
Tues 26 Nov: 2-3pm, B8 George Begg. Christian Obrecht on GPU implementations of fluid dynamics simulations on regular meshes: some recent advances
Weds 30 Oct: Intermediate CUDA training run by NVIDIA
Tues 29 Oct: 2pm, Univ Place, NVIDIA and Stephen Longshaw.
Weds 2 Oct 2013 - Large Scale Optimization and High Performance Computing for Asset Management, Daniel Egloff (QuantAlea)
Tuesday 23 July MathWorks (GPUs for MATLAB) and NVIDIA (GPUs & CUDA)
Thur 2 May, 2pm Lessons from GTC and on using the Intel Xeon Phi
Mon 10 Dec, 2-3:30pm: Dataflow and MultiGPU SPH - 4.205, University Place
Tues 25 Sept Seminar on implementing financial models on GPUs, FPGAs and in the Cloud
Mon 15 Oct: OpenCL training from UoM IT Services
Thurs 25 Oct: Hands-on "OpenACC" workshop run by Cray UK Ltd.
17 May 2012 Speakers on healthcare policy simulation in OpenCL, MHD algorithms in CUDA, Tridiagonal Solvers in CUDA
20 April 2012 Francois Bodin, CAPS: "Programming Heterogeneous Many-Cores using Directives" using HMPP
23 March 2012 Roko Grubisic, ARM: "Embedded Computer Graphics and ARM Mali GPUs"
02 March 2012 Speakers on profiling, sparse matrix algebra and atmospheric chemistry
09 Dec 2011 MPI and GPUs, directives-based programming, FPGA and GPU comparison, ideas for 2012
30 Sept 2011 GPU programming in FORTRAN, multiple GPUs, image reconstruction
15 July 2011 Jack Dongarra key note on Emerging Technologies
18 Mar 2011 OpenCL, debugging and profiling tools, porting C to CUDA, real time analysis
26 Nov 2010 biological MD, smoothed particle hydrodynamics, Monte Carlo financial models, Markov models