Computational Science Community Wiki

Real Time Applications Using GPUs

alt="NVIDIA CUDA Research Centre"

  • Software for GPUs inc. compiler/directives, maths libs & tools (debuggers and profilers)


Meeting 11 Jan 2011

GPU for Real-time Applications – Meeting Summary

Date: 11 January 2011

Coordinator: Chris Taylor; Meeting (Summary: c/o MKB)



Digital beam-forming (both Fourier transform and cross-correlation based techniques) for passive millimetre wave imaging + radio astronomy). A real-time system PC based system will require:

  1. High speed (Gbps on several parallel channels) 1-bit digitisation in PC data acquisition card
  2. Transfer of this data into a GPU
  3. Large number of Logical Operations Per Second (LOPS) carried out to perform above maths functions
  4. Accumulation of the result of the LOPS and output every 10 ms.

All of the above needs to happen continuously (minutes to hours) with a 95% duty cycle. The demonstrators require tens of parallel and prototype will require hundreds of channels. Quantitatively the requirements are:

  1. Demonstrator sensor: Input data rate: 20 Gbits/s; Computation rate: 1.8 x 1012 LOPS
  2. Prototype (2-3 years) sensor: Input data rate: 350 Gbits/s; Computation rate: 5 x 1013 LOPS

Current real-time solution: DAQ & GPU cards using 2xPCIe x16 bus slots (Intel 5520 chipset) data via RAM in PC. Recommended commercial assemblers of such a PC are: Workstation Specialists, Dell, Lenovo Longer term real-time solution: Dependent on what happens with CPU & GPU amalgamation & PC bus speeds



General problems (solutions) for the use of GPU in real-time

  1. GPUs are designed for floating point operation, but digital beam-forming requires logical operations on single bits. (Machine code level programming can encode a parallel stream of single bits data into a floating point number, perform the mathematical operation, then return the floating point number to a parallel stream of logical outputs. This technique might be referred to as integer masking)
  2. GPUs alone can’t i/o the data fast enough to keep up with their processing due to the PCIe bus. (The above amalgamation of CPU & GPU into a single socket based processor would leave industry free to adapt a bus which may exploit the full compute capability of the new processor.)