Computation and Communication Aware Task Graph Scheduling on Multi-GPGPU Systems

碩士 === 國立交通大學 === 電子工程學系 電子研究所 === 102 === Due to the massive parallel computation capability, GPGPUs have emerged as popular throughput computing platforms. Due to the astonishing computation capability, there is a growing interest in exploiting systems with multiple GPGPUs. However, attaining supe...

Full description

Bibliographic Details
Main Authors: Wang, Yun-Ting, 王允廷
Other Authors: Lai, Bo-Cheng
Format: Others
Language:en_US
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/71737217737762883446
Description
Summary:碩士 === 國立交通大學 === 電子工程學系 電子研究所 === 102 === Due to the massive parallel computation capability, GPGPUs have emerged as popular throughput computing platforms. Due to the astonishing computation capability, there is a growing interest in exploiting systems with multiple GPGPUs. However, attaining superior performance in a multi-GPGPU system involves three main design challenges. The first challenge is to balance the loading of tasks assigned to each GPGPU. An imbalanced loading across the system could cause idling of some GPGPUs and degrade the overall performance. The second is to exploit the memory resource by fully leveraging the data reuse between threads as well as kernels. Poor data reuse would cause excessive data accesses and transfers. The third challenge stems from how efficient a program could hide the data transfer overhead by overlapping the computation and communication [1]. This thesis aims at addressing the above design issues by proposing a Computation and Communication Aware task graph Scheduling (CCAS) for multi-GPGPU systems. The proposed scheduling approach (CCAS) adopts an effective heuristic algorithm that considers both the data reuse, and load balance to the performance of multi-GPGPU systems. In multi-graph applications, a pre-scan method is applied to cluster disjoint task graphs to each GPGPU based on the characteristics of the graph. In summary, the proposed CCAS approach can achieve an average of 22.15% performance enhancement when compared with a previous work. In multi-graph applications, the proposed pre-scan clustering method has achieved good performance scaling when the system size is increased from 2 to 4 GPGPUs.