Optimizing Dynamic Programming on Graphics Processing Units via Data Reuse and Data Prefetch InterData Prefetch with Inter-Block Synchronization
碩士 === 國立彰化師範大學 === 資訊工程學系 === 100 === Our study is focused on improving an important category of Dynamic programming (DP) problems called Nonserial polyadic dynamic programming (NPDP) on a graphics processing unit (GPU).Because NPDP in different stages of the computation has different degree of par...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2012
|
Online Access: | http://ndltd.ncl.edu.tw/handle/72744617351629968168 |
id |
ndltd-TW-100NCUE5392021 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-100NCUE53920212015-10-13T21:28:01Z http://ndltd.ncl.edu.tw/handle/72744617351629968168 Optimizing Dynamic Programming on Graphics Processing Units via Data Reuse and Data Prefetch InterData Prefetch with Inter-Block Synchronization 利用區塊間同步機制提升動態規則問題在圖形加速器中之執行效能 Ting-Hong Lin 林庭宏 碩士 國立彰化師範大學 資訊工程學系 100 Our study is focused on improving an important category of Dynamic programming (DP) problems called Nonserial polyadic dynamic programming (NPDP) on a graphics processing unit (GPU).Because NPDP in different stages of the computation has different degree of parallelism, so it is hard for us to use GPU computation ability fully. In previous studies, we proposed an algorithm that can adaptively adjust the thread-level parallelism to solve this problem and improve the effectiveness of the NPDP such problems. In this research, we focused on the memory used of GPU optimization. We used the Tiling technique to divide subproblems and data. Subproblems and data are tiled to make it possible to fit small data regions into shared memory and reuse the buffered data for each tile of subproblems, thus reducing the amount of global memory access. However, we found invoking the same kernel many times, due to data consistency enforcement across different stages, this makes it impossible to reuse the tiled data in shared memory after the kernel is re-invoked. Fortunately, the inter-block synchronization technique allows us to invoke the kernel exactly one time with the restriction that the maximum number of blocks is equal to the total number of streaming multiprocessors. In addition to data reuse, invoking the kernel only one time also enables us to prefetch data to shared memory across inter-block synchronization point, which improves the performance more than data reuse. Experimental results demonstrate that our method can achieve a speedup of 3.2 over the previously published GPU algorithm. Chao-Chin Wu 伍朝欽 2012 學位論文 ; thesis 40 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立彰化師範大學 === 資訊工程學系 === 100 === Our study is focused on improving an important category of Dynamic programming (DP) problems called Nonserial polyadic dynamic programming (NPDP) on a graphics processing unit (GPU).Because NPDP in different stages of the computation has different degree of parallelism, so it is hard for us to use GPU computation ability fully. In previous studies, we proposed an algorithm that can adaptively adjust the thread-level parallelism to solve this problem and improve the effectiveness of the NPDP such problems. In this research, we focused on the memory used of GPU optimization. We used the Tiling technique to divide subproblems and data. Subproblems and data are tiled to make it possible to fit small data regions into shared memory and reuse the buffered data for each tile of subproblems, thus reducing the amount of global memory access. However, we found invoking the same kernel many times, due to data consistency enforcement across different stages, this makes it impossible to reuse the tiled data in shared memory after the kernel is re-invoked. Fortunately, the inter-block synchronization technique allows us to invoke the kernel exactly one time with the restriction that the maximum number of blocks is equal to the total number of streaming multiprocessors. In addition to data reuse, invoking the kernel only one time also enables us to prefetch data to shared memory across inter-block synchronization point, which improves the performance more than data reuse. Experimental results demonstrate that our method can achieve a speedup of 3.2 over the previously published GPU algorithm.
|
author2 |
Chao-Chin Wu |
author_facet |
Chao-Chin Wu Ting-Hong Lin 林庭宏 |
author |
Ting-Hong Lin 林庭宏 |
spellingShingle |
Ting-Hong Lin 林庭宏 Optimizing Dynamic Programming on Graphics Processing Units via Data Reuse and Data Prefetch InterData Prefetch with Inter-Block Synchronization |
author_sort |
Ting-Hong Lin |
title |
Optimizing Dynamic Programming on Graphics Processing Units via Data Reuse and Data Prefetch InterData Prefetch with Inter-Block Synchronization |
title_short |
Optimizing Dynamic Programming on Graphics Processing Units via Data Reuse and Data Prefetch InterData Prefetch with Inter-Block Synchronization |
title_full |
Optimizing Dynamic Programming on Graphics Processing Units via Data Reuse and Data Prefetch InterData Prefetch with Inter-Block Synchronization |
title_fullStr |
Optimizing Dynamic Programming on Graphics Processing Units via Data Reuse and Data Prefetch InterData Prefetch with Inter-Block Synchronization |
title_full_unstemmed |
Optimizing Dynamic Programming on Graphics Processing Units via Data Reuse and Data Prefetch InterData Prefetch with Inter-Block Synchronization |
title_sort |
optimizing dynamic programming on graphics processing units via data reuse and data prefetch interdata prefetch with inter-block synchronization |
publishDate |
2012 |
url |
http://ndltd.ncl.edu.tw/handle/72744617351629968168 |
work_keys_str_mv |
AT tinghonglin optimizingdynamicprogrammingongraphicsprocessingunitsviadatareuseanddataprefetchinterdataprefetchwithinterblocksynchronization AT líntínghóng optimizingdynamicprogrammingongraphicsprocessingunitsviadatareuseanddataprefetchinterdataprefetchwithinterblocksynchronization AT tinghonglin lìyòngqūkuàijiāntóngbùjīzhìtíshēngdòngtàiguīzéwèntízàitúxíngjiāsùqìzhōngzhīzhíxíngxiàonéng AT líntínghóng lìyòngqūkuàijiāntóngbùjīzhìtíshēngdòngtàiguīzéwèntízàitúxíngjiāsùqìzhōngzhīzhíxíngxiàonéng |
_version_ |
1718064950957572096 |