GPU Warp Scheduling Using Memory Stall Sampling on CASLAB-GPUSIM
碩士 === 國立成功大學 === 電腦與通信工程研究所 === 105 === In these years, Graphic Processing Units (GPUs), well known for parallel computing, are widely adopted to accelerate non-graphic workloads such as Data Mining, Machine Learning, and Image Recognition. Modern GPUs utilize a huge number of concurrent threads an...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2017
|
Online Access: | http://ndltd.ncl.edu.tw/handle/t4rek3 |
Summary: | 碩士 === 國立成功大學 === 電腦與通信工程研究所 === 105 === In these years, Graphic Processing Units (GPUs), well known for parallel computing, are widely adopted to accelerate non-graphic workloads such as Data Mining, Machine Learning, and Image Recognition. Modern GPUs utilize a huge number of concurrent threads and Fine-Grained Multithreading technique to overlap operation latencies. However, recent researches have shown that the memory contention problem is one of the most important bottlenecks preventing modern GPUs from achieving peak performance. The memory contention problem could be even more serious when the degree of multithreading gets higher due to the overloading of the memory system while the latency hiding ability is poor with a low degree of multithreading. We propose Memory-Contention Aware Warp Scheduling (MAWS) to strike a balance between memory workloads and memory resources. This scheme uses dynamic sampling to accurately recognize the severity level of the memory contention problem and provides an appropriate degree of thread concurrency correspondingly. Our experiments show that MAWS achieves a geometric mean speedup of 96.4% over baseline Loose Round-Robin scheduler for cache sensitive workloads on GPGPU-Sim. MAWS also achieves an overall speedup of 17.4% on CASLAB-GPUSIM.
|
---|