Optimizing Dynamic Programming on Graphics Processing Units via Data Reuse and Data Prefetch InterData Prefetch with Inter-Block Synchronization

碩士 === 國立彰化師範大學 === 資訊工程學系 === 100 === Our study is focused on improving an important category of Dynamic programming (DP) problems called Nonserial polyadic dynamic programming (NPDP) on a graphics processing unit (GPU).Because NPDP in different stages of the computation has different degree of par...

Full description

Bibliographic Details
Main Authors:	Ting-Hong Lin, 林庭宏
Other Authors:	Chao-Chin Wu
Format:	Others
Language:	zh-TW
Published:	2012
Online Access:	http://ndltd.ncl.edu.tw/handle/72744617351629968168

id	ndltd-TW-100NCUE5392021
record_format	oai_dc
spelling	ndltd-TW-100NCUE53920212015-10-13T21:28:01Z http://ndltd.ncl.edu.tw/handle/72744617351629968168 Optimizing Dynamic Programming on Graphics Processing Units via Data Reuse and Data Prefetch InterData Prefetch with Inter-Block Synchronization 利用區塊間同步機制提升動態規則問題在圖形加速器中之執行效能 Ting-Hong Lin 林庭宏碩士國立彰化師範大學資訊工程學系 100 Our study is focused on improving an important category of Dynamic programming (DP) problems called Nonserial polyadic dynamic programming (NPDP) on a graphics processing unit (GPU).Because NPDP in different stages of the computation has different degree of parallelism, so it is hard for us to use GPU computation ability fully. In previous studies, we proposed an algorithm that can adaptively adjust the thread-level parallelism to solve this problem and improve the effectiveness of the NPDP such problems. In this research, we focused on the memory used of GPU optimization. We used the Tiling technique to divide subproblems and data. Subproblems and data are tiled to make it possible to fit small data regions into shared memory and reuse the buffered data for each tile of subproblems, thus reducing the amount of global memory access. However, we found invoking the same kernel many times, due to data consistency enforcement across different stages, this makes it impossible to reuse the tiled data in shared memory after the kernel is re-invoked. Fortunately, the inter-block synchronization technique allows us to invoke the kernel exactly one time with the restriction that the maximum number of blocks is equal to the total number of streaming multiprocessors. In addition to data reuse, invoking the kernel only one time also enables us to prefetch data to shared memory across inter-block synchronization point, which improves the performance more than data reuse. Experimental results demonstrate that our method can achieve a speedup of 3.2 over the previously published GPU algorithm. Chao-Chin Wu 伍朝欽 2012 學位論文 ; thesis 40 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立彰化師範大學 === 資訊工程學系 === 100 === Our study is focused on improving an important category of Dynamic programming (DP) problems called Nonserial polyadic dynamic programming (NPDP) on a graphics processing unit (GPU).Because NPDP in different stages of the computation has different degree of parallelism, so it is hard for us to use GPU computation ability fully. In previous studies, we proposed an algorithm that can adaptively adjust the thread-level parallelism to solve this problem and improve the effectiveness of the NPDP such problems. In this research, we focused on the memory used of GPU optimization. We used the Tiling technique to divide subproblems and data. Subproblems and data are tiled to make it possible to fit small data regions into shared memory and reuse the buffered data for each tile of subproblems, thus reducing the amount of global memory access. However, we found invoking the same kernel many times, due to data consistency enforcement across different stages, this makes it impossible to reuse the tiled data in shared memory after the kernel is re-invoked. Fortunately, the inter-block synchronization technique allows us to invoke the kernel exactly one time with the restriction that the maximum number of blocks is equal to the total number of streaming multiprocessors. In addition to data reuse, invoking the kernel only one time also enables us to prefetch data to shared memory across inter-block synchronization point, which improves the performance more than data reuse. Experimental results demonstrate that our method can achieve a speedup of 3.2 over the previously published GPU algorithm.
author2	Chao-Chin Wu
author_facet	Chao-Chin Wu Ting-Hong Lin 林庭宏
author	Ting-Hong Lin 林庭宏
spellingShingle	Ting-Hong Lin 林庭宏 Optimizing Dynamic Programming on Graphics Processing Units via Data Reuse and Data Prefetch InterData Prefetch with Inter-Block Synchronization
author_sort	Ting-Hong Lin
title	Optimizing Dynamic Programming on Graphics Processing Units via Data Reuse and Data Prefetch InterData Prefetch with Inter-Block Synchronization
title_short	Optimizing Dynamic Programming on Graphics Processing Units via Data Reuse and Data Prefetch InterData Prefetch with Inter-Block Synchronization
title_full	Optimizing Dynamic Programming on Graphics Processing Units via Data Reuse and Data Prefetch InterData Prefetch with Inter-Block Synchronization
title_fullStr	Optimizing Dynamic Programming on Graphics Processing Units via Data Reuse and Data Prefetch InterData Prefetch with Inter-Block Synchronization
title_full_unstemmed	Optimizing Dynamic Programming on Graphics Processing Units via Data Reuse and Data Prefetch InterData Prefetch with Inter-Block Synchronization
title_sort	optimizing dynamic programming on graphics processing units via data reuse and data prefetch interdata prefetch with inter-block synchronization
publishDate	2012
url	http://ndltd.ncl.edu.tw/handle/72744617351629968168
work_keys_str_mv	AT tinghonglin optimizingdynamicprogrammingongraphicsprocessingunitsviadatareuseanddataprefetchinterdataprefetchwithinterblocksynchronization AT líntínghóng optimizingdynamicprogrammingongraphicsprocessingunitsviadatareuseanddataprefetchinterdataprefetchwithinterblocksynchronization AT tinghonglin lìyòngqūkuàijiāntóngbùjīzhìtíshēngdòngtàiguīzéwèntízàitúxíngjiāsùqìzhōngzhīzhíxíngxiàonéng AT líntínghóng lìyòngqūkuàijiāntóngbùjīzhìtíshēngdòngtàiguīzéwèntízàitúxíngjiāsùqìzhōngzhīzhíxíngxiàonéng
_version_	1718064950957572096

Optimizing Dynamic Programming on Graphics Processing Units via Data Reuse and Data Prefetch InterData Prefetch with Inter-Block Synchronization

Similar Items