Tuned and asynchronous stencil kernels for CPU/GPU systems

We describe heterogeneous multi-CPU and multi-GPU implementations of Jacobi's iterative method for the 2-D Poisson equation on a structured grid, in both single- and double-precision. Properly tuned, our best implementation achieves 98% of the empirical streaming GPU bandwidth (66% of peak) on...

Full description

Bibliographic Details
Main Author: Venkatasubramanian, Sundaresan
Published: Georgia Institute of Technology 2009
Subjects:
GPU
CPU
Online Access:http://hdl.handle.net/1853/29728