GPU Warp Scheduling Using Memory Stall Sampling on CASLAB-GPUSIM

碩士 === 國立成功大學 === 電腦與通信工程研究所 === 105 === In these years, Graphic Processing Units (GPUs), well known for parallel computing, are widely adopted to accelerate non-graphic workloads such as Data Mining, Machine Learning, and Image Recognition. Modern GPUs utilize a huge number of concurrent threads an...

Full description

Bibliographic Details
Main Authors: Chien-MingChiu, 邱健鳴
Other Authors: Chung-Ho Chen
Format: Others
Language:en_US
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/t4rek3
Description
Summary:碩士 === 國立成功大學 === 電腦與通信工程研究所 === 105 === In these years, Graphic Processing Units (GPUs), well known for parallel computing, are widely adopted to accelerate non-graphic workloads such as Data Mining, Machine Learning, and Image Recognition. Modern GPUs utilize a huge number of concurrent threads and Fine-Grained Multithreading technique to overlap operation latencies. However, recent researches have shown that the memory contention problem is one of the most important bottlenecks preventing modern GPUs from achieving peak performance. The memory contention problem could be even more serious when the degree of multithreading gets higher due to the overloading of the memory system while the latency hiding ability is poor with a low degree of multithreading. We propose Memory-Contention Aware Warp Scheduling (MAWS) to strike a balance between memory workloads and memory resources. This scheme uses dynamic sampling to accurately recognize the severity level of the memory contention problem and provides an appropriate degree of thread concurrency correspondingly. Our experiments show that MAWS achieves a geometric mean speedup of 96.4% over baseline Loose Round-Robin scheduler for cache sensitive workloads on GPGPU-Sim. MAWS also achieves an overall speedup of 17.4% on CASLAB-GPUSIM.