Memory Partitioning and Optimization of On-Chip Accelerators with High-Level Synthesis

碩士 === 國立清華大學 === 資訊工程學系所 === 106 === Current researches in the design space exploration for accelerators mainly rely on either RTL-based flow or High-Level Synthesis flow. However, both of them are very time-consuming. Pre-RTL tools, such as Aladdin, can directly analyze designs in high-level langu...

Full description

Bibliographic Details
Main Authors:	Peng, Te-Hsin., 彭德欣
Other Authors:	Huang, Chih-Tsun
Format:	Others
Language:	en_US
Published:	2017
Online Access:	http://ndltd.ncl.edu.tw/handle/24zvgm

id	ndltd-TW-106NTHU5392027
record_format	oai_dc
spelling	ndltd-TW-106NTHU53920272019-05-16T00:15:33Z http://ndltd.ncl.edu.tw/handle/24zvgm Memory Partitioning and Optimization of On-Chip Accelerators with High-Level Synthesis 利用高階合成進行加速器之記憶體分區與最佳化技術 Peng, Te-Hsin. 彭德欣碩士國立清華大學資訊工程學系所 106 Current researches in the design space exploration for accelerators mainly rely on either RTL-based flow or High-Level Synthesis flow. However, both of them are very time-consuming. Pre-RTL tools, such as Aladdin, can directly analyze designs in high-level languages and take less time to explore the timing, area, and power estimation of different micro-architectures. Our previous work proposes a design assisted flow, which combines the HLS flow with the assistance of Aladdin to explore the design space. Vivado HLS, which targets at the FPGA design flow, is used. If users want to adopt the ASIC design flow, the result may be inaccurate. Therefore, in this thesis, we extend the exploration flow to adopt the ASIC HLS tool such as Stratus HLS, resulting in a more accurate design space exploration. In addition, the conventional partitioning approaches, such as the block, cyclic, and block-cyclic techniques, can not evenly distribute the data elements into the memory banks. It causes the memory conflicts and thus becomes the bottleneck for the performance. Our previous work proposes the novel remapping algorithm to solve the problem. However, the original remapping scheme will introduce irregular data padding or unnecessary data swapping, leading to the extra area or latency overhead. In this thesis, we improve the remapping algorithm by proposing a more general and efficient approach to find out the regularity. We compare the optimized remapping algorithm and the conventional cyclic approach in six benchmark applications with different access patterns. And we apply the different combinations of the loop unrolling and memory partition to explore the design space. Then we classify the six benchmarks based on their access patterns and analyze the performance and area. The results of experiments show that our optimized remapping approach can effectively improve the performance with a smaller area overhead as compared with the cyclic approach. Huang, Chih-Tsun 黃稚存 2017 學位論文 ; thesis 56 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立清華大學 === 資訊工程學系所 === 106 === Current researches in the design space exploration for accelerators mainly rely on either RTL-based flow or High-Level Synthesis flow. However, both of them are very time-consuming. Pre-RTL tools, such as Aladdin, can directly analyze designs in high-level languages and take less time to explore the timing, area, and power estimation of different micro-architectures. Our previous work proposes a design assisted flow, which combines the HLS flow with the assistance of Aladdin to explore the design space. Vivado HLS, which targets at the FPGA design flow, is used. If users want to adopt the ASIC design flow, the result may be inaccurate. Therefore, in this thesis, we extend the exploration flow to adopt the ASIC HLS tool such as Stratus HLS, resulting in a more accurate design space exploration. In addition, the conventional partitioning approaches, such as the block, cyclic, and block-cyclic techniques, can not evenly distribute the data elements into the memory banks. It causes the memory conflicts and thus becomes the bottleneck for the performance. Our previous work proposes the novel remapping algorithm to solve the problem. However, the original remapping scheme will introduce irregular data padding or unnecessary data swapping, leading to the extra area or latency overhead. In this thesis, we improve the remapping algorithm by proposing a more general and efficient approach to find out the regularity. We compare the optimized remapping algorithm and the conventional cyclic approach in six benchmark applications with different access patterns. And we apply the different combinations of the loop unrolling and memory partition to explore the design space. Then we classify the six benchmarks based on their access patterns and analyze the performance and area. The results of experiments show that our optimized remapping approach can effectively improve the performance with a smaller area overhead as compared with the cyclic approach.
author2	Huang, Chih-Tsun
author_facet	Huang, Chih-Tsun Peng, Te-Hsin. 彭德欣
author	Peng, Te-Hsin. 彭德欣
spellingShingle	Peng, Te-Hsin. 彭德欣 Memory Partitioning and Optimization of On-Chip Accelerators with High-Level Synthesis
author_sort	Peng, Te-Hsin.
title	Memory Partitioning and Optimization of On-Chip Accelerators with High-Level Synthesis
title_short	Memory Partitioning and Optimization of On-Chip Accelerators with High-Level Synthesis
title_full	Memory Partitioning and Optimization of On-Chip Accelerators with High-Level Synthesis
title_fullStr	Memory Partitioning and Optimization of On-Chip Accelerators with High-Level Synthesis
title_full_unstemmed	Memory Partitioning and Optimization of On-Chip Accelerators with High-Level Synthesis
title_sort	memory partitioning and optimization of on-chip accelerators with high-level synthesis
publishDate	2017
url	http://ndltd.ncl.edu.tw/handle/24zvgm
work_keys_str_mv	AT pengtehsin memorypartitioningandoptimizationofonchipacceleratorswithhighlevelsynthesis AT péngdéxīn memorypartitioningandoptimizationofonchipacceleratorswithhighlevelsynthesis AT pengtehsin lìyònggāojiēhéchéngjìnxíngjiāsùqìzhījìyìtǐfēnqūyǔzuìjiāhuàjìshù AT péngdéxīn lìyònggāojiēhéchéngjìnxíngjiāsùqìzhījìyìtǐfēnqūyǔzuìjiāhuàjìshù
_version_	1719163066458832896

Memory Partitioning and Optimization of On-Chip Accelerators with High-Level Synthesis

Similar Items