Tuned and asynchronous stencil kernels for CPU/GPU systems
We describe heterogeneous multi-CPU and multi-GPU implementations of Jacobi's iterative method for the 2-D Poisson equation on a structured grid, in both single- and double-precision. Properly tuned, our best implementation achieves 98% of the empirical streaming GPU bandwidth (66% of peak) on...
Main Author: | |
---|---|
Published: |
Georgia Institute of Technology
2009
|
Subjects: | |
Online Access: | http://hdl.handle.net/1853/29728 |