A Study of Application-Specific Compute Accelerator (ASCA) Design

博士 === 國立交通大學 === 資訊科學與工程研究所 === 102 === In order to satisfy the growing demand for high-performance computing in modern embedded devices, several architectural and micro-architectural enhancements have been implemented in processor architectures. Application-specific compute accelerator (ASCA) is a...

Full description

Bibliographic Details
Main Authors: Wu, I-Wei, 吳奕緯
Other Authors: Shann, Jyh-Jiun
Format: Others
Language:en_US
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/41431361993681119739
Description
Summary:博士 === 國立交通大學 === 資訊科學與工程研究所 === 102 === In order to satisfy the growing demand for high-performance computing in modern embedded devices, several architectural and micro-architectural enhancements have been implemented in processor architectures. Application-specific compute accelerator (ASCA) is an effective approach to improve the processor performance without tremendous modification in its core architecture. ASCA is a special and extra functional unit within the base processor and used to accelerate one or several specified applications. ASCA is usually generated from a set of frequently executed operation pattern, called application specific operation pattern (ASOP), explored from one or several target applications. Since ASCA would increase the implementation cost of the processor core, minimizing the area cost of ASCA without or with a little performance degradation would become an important research issue. Because of different requirements in space and speed, ASCA usually has multiple hardware implementation options. Under pipeline-stage timing constraint, some options could achieve the same speedup but different implementation costs. As a consequence, we proposed an ASOP exploration algorithm with integrated hardware design space exploration to explore not only ASOP but also its hardware implementation option. Compared with the previous research, our approach resulted in significant improvement in area efficiency. Except for ASCA, issuing multiple instructions is a common approach to improve the performance of processor core. Nevertheless, the impact of combining both of these approaches in the same design is not well understood. While previous studies have shown that ASCA can potentially improve performance in some applications on certain multiple-issue architectures, the algorithms used to identify ASOP for multiple-issue architectures yield only limited performance improvement. This is because not all arithmetic operations are suited for ASOP for multiple-issue architectures. To explore the full potential of ASCA for multiple-issue architectures, two important factors need to be considered: (1) the execution performance of an application is dominated by critical (located on the critical path) and highly resource contentious (having a high probability of being delayed during execution due to hardware resource limitations) operations, and (2) an operation may become critical and/or highly resource contentious after some operations are added to the ASOP. The second topic of this thesis presented an ASOP exploration algorithm for multiple-issue architectures that focuses on these two factors. Simulation results show that the proposed algorithm outperforms previously published algorithms. According to the ASOPs generated in the second topic, the way of constructing ASCA architecture is addressed in the first issue of the third topic in this thesis. To make more operations to execute on the ASCA simultaneously, the proposed ASCA construction algorithm merges several data-independent ASOPs to construct the ASCA. After generating the ASCA architecture, the final phase in ASCA design is ASCA exploitation. Because of the area cost limitation, the ASCA generated in previous phase may not support all ASOPs. Accordingly, ASCA exploitation is to determine which operation should be executed on the ASCA and to schedule the execution cycle for each operation. Compared with previous works, the proposed one achieves a further speedup by scheduling operations on ASCA and the FUs of the base processor simultaneously. This issue were addressed in the third topic of this thesis.