Memory Partitioning and Optimization of On-Chip Accelerators with High-Level Synthesis

碩士 === 國立清華大學 === 資訊工程學系所 === 106 === Current researches in the design space exploration for accelerators mainly rely on either RTL-based flow or High-Level Synthesis flow. However, both of them are very time-consuming. Pre-RTL tools, such as Aladdin, can directly analyze designs in high-level langu...

Full description

Bibliographic Details
Main Authors: Peng, Te-Hsin., 彭德欣
Other Authors: Huang, Chih-Tsun
Format: Others
Language:en_US
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/24zvgm
id ndltd-TW-106NTHU5392027
record_format oai_dc
spelling ndltd-TW-106NTHU53920272019-05-16T00:15:33Z http://ndltd.ncl.edu.tw/handle/24zvgm Memory Partitioning and Optimization of On-Chip Accelerators with High-Level Synthesis 利用高階合成進行加速器之記憶體分區與最佳化技術 Peng, Te-Hsin. 彭德欣 碩士 國立清華大學 資訊工程學系所 106 Current researches in the design space exploration for accelerators mainly rely on either RTL-based flow or High-Level Synthesis flow. However, both of them are very time-consuming. Pre-RTL tools, such as Aladdin, can directly analyze designs in high-level languages and take less time to explore the timing, area, and power estimation of different micro-architectures. Our previous work proposes a design assisted flow, which combines the HLS flow with the assistance of Aladdin to explore the design space. Vivado HLS, which targets at the FPGA design flow, is used. If users want to adopt the ASIC design flow, the result may be inaccurate. Therefore, in this thesis, we extend the exploration flow to adopt the ASIC HLS tool such as Stratus HLS, resulting in a more accurate design space exploration. In addition, the conventional partitioning approaches, such as the block, cyclic, and block-cyclic techniques, can not evenly distribute the data elements into the memory banks. It causes the memory conflicts and thus becomes the bottleneck for the performance. Our previous work proposes the novel remapping algorithm to solve the problem. However, the original remapping scheme will introduce irregular data padding or unnecessary data swapping, leading to the extra area or latency overhead. In this thesis, we improve the remapping algorithm by proposing a more general and efficient approach to find out the regularity. We compare the optimized remapping algorithm and the conventional cyclic approach in six benchmark applications with different access patterns. And we apply the different combinations of the loop unrolling and memory partition to explore the design space. Then we classify the six benchmarks based on their access patterns and analyze the performance and area. The results of experiments show that our optimized remapping approach can effectively improve the performance with a smaller area overhead as compared with the cyclic approach. Huang, Chih-Tsun 黃稚存 2017 學位論文 ; thesis 56 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立清華大學 === 資訊工程學系所 === 106 === Current researches in the design space exploration for accelerators mainly rely on either RTL-based flow or High-Level Synthesis flow. However, both of them are very time-consuming. Pre-RTL tools, such as Aladdin, can directly analyze designs in high-level languages and take less time to explore the timing, area, and power estimation of different micro-architectures. Our previous work proposes a design assisted flow, which combines the HLS flow with the assistance of Aladdin to explore the design space. Vivado HLS, which targets at the FPGA design flow, is used. If users want to adopt the ASIC design flow, the result may be inaccurate. Therefore, in this thesis, we extend the exploration flow to adopt the ASIC HLS tool such as Stratus HLS, resulting in a more accurate design space exploration. In addition, the conventional partitioning approaches, such as the block, cyclic, and block-cyclic techniques, can not evenly distribute the data elements into the memory banks. It causes the memory conflicts and thus becomes the bottleneck for the performance. Our previous work proposes the novel remapping algorithm to solve the problem. However, the original remapping scheme will introduce irregular data padding or unnecessary data swapping, leading to the extra area or latency overhead. In this thesis, we improve the remapping algorithm by proposing a more general and efficient approach to find out the regularity. We compare the optimized remapping algorithm and the conventional cyclic approach in six benchmark applications with different access patterns. And we apply the different combinations of the loop unrolling and memory partition to explore the design space. Then we classify the six benchmarks based on their access patterns and analyze the performance and area. The results of experiments show that our optimized remapping approach can effectively improve the performance with a smaller area overhead as compared with the cyclic approach.
author2 Huang, Chih-Tsun
author_facet Huang, Chih-Tsun
Peng, Te-Hsin.
彭德欣
author Peng, Te-Hsin.
彭德欣
spellingShingle Peng, Te-Hsin.
彭德欣
Memory Partitioning and Optimization of On-Chip Accelerators with High-Level Synthesis
author_sort Peng, Te-Hsin.
title Memory Partitioning and Optimization of On-Chip Accelerators with High-Level Synthesis
title_short Memory Partitioning and Optimization of On-Chip Accelerators with High-Level Synthesis
title_full Memory Partitioning and Optimization of On-Chip Accelerators with High-Level Synthesis
title_fullStr Memory Partitioning and Optimization of On-Chip Accelerators with High-Level Synthesis
title_full_unstemmed Memory Partitioning and Optimization of On-Chip Accelerators with High-Level Synthesis
title_sort memory partitioning and optimization of on-chip accelerators with high-level synthesis
publishDate 2017
url http://ndltd.ncl.edu.tw/handle/24zvgm
work_keys_str_mv AT pengtehsin memorypartitioningandoptimizationofonchipacceleratorswithhighlevelsynthesis
AT péngdéxīn memorypartitioningandoptimizationofonchipacceleratorswithhighlevelsynthesis
AT pengtehsin lìyònggāojiēhéchéngjìnxíngjiāsùqìzhījìyìtǐfēnqūyǔzuìjiāhuàjìshù
AT péngdéxīn lìyònggāojiēhéchéngjìnxíngjiāsùqìzhījìyìtǐfēnqūyǔzuìjiāhuàjìshù
_version_ 1719163066458832896