Compilers for VLIW DSP Architectures with Distributed and Irregular Designs

博士 === 國立清華大學 === 資訊工程學系 === 95 === VLIW architectures have already been the main-stream design for a modern high-end processor in recent years to support more instruction-level-parallelism (ILP) and potential performance than the traditional single-issue CISC/RISC machines. Due to the advances in V...

Full description

Bibliographic Details
Main Authors: Yung-Chia Lin, 林永嘉
Other Authors: Jenq Kuen Lee
Format: Others
Language:en_US
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/57626100213633979560
Description
Summary:博士 === 國立清華大學 === 資訊工程學系 === 95 === VLIW architectures have already been the main-stream design for a modern high-end processor in recent years to support more instruction-level-parallelism (ILP) and potential performance than the traditional single-issue CISC/RISC machines. Due to the advances in VLSI technology, people nowadays could develop more powerful and faster chips than ever, but also get additional issues to be considered while designing a new VLIW processor: complexity, die size, and power dissipation. For the embedded-system market, a successful processor design not only requires to provide ample performance but features low-power consumption, low cost, and reduced time-to-market. Therefore, some popular, fancy and sophisticated design techniques to enhance the performance of a general-purpose VLIW processor may not be feasible for an embedded processor that also demands a high performance criterion. Wide varieties of register file architectures and irregular designs --- developed for embedded processors --- have turned to aim at reducing the complexity, power dissipation, and die size these years, by contrast with the traditional architectures implemented by high-performance processors. There has been considerable interest in developing the techniques to effectively support the code generation and optimizations for such architectures with irregular designs because the compiler is generally regarded as the most important system-software component that supports a processor design to achieve success. It is also essential to have adequate compiler support for VLIW architectures so that the programming efficiency could be dramatically improved. This dissertation has made contributions to the design and development of an effective compiler for a novel VLIW DSP with irregular designs. The target DSP architecture, known as the PAC DSP core, is designed with distinctively partitioned register files in which port access is highly restricted. Moreover, the PAC DSP utilizes a heterogeneous distributed data-path architecture to attain an efficient design with low complexity, small size, and the possible low power consumption. We believe that the PAC DSP employs a promising architecture model to pragmatically support the high parallelism demanded by the DSP applications but reduce the disadvantageous progress of chip complexity, die size, and power dissipation. Our experiences in designing the compiler support for the PAC DSP may also be of interest to those involved in developing compilers for the similar architectures with such irregular designs. Our major contributions in this dissertation are as follows: 1. We present our application of the Open64/ORC infrastructure to a novel VLIW DSP and the specific design for handling its register file architecture. As part of an effort to overcome the new challenges of code generation for the PAC DSP, we have developed a new register allocation framework and other retargeting optimization phases that allow the effective generation of %support in generating high quality code. 2. We propose a novel heuristic algorithm, named ping-pong aware local favorable (PALF) register allocation, to obtain advantageous register allocation that is expected to better utilize irregular register file architectures. We also propose an alternate register allocation scheme using a simulated-annealing (SA) approach, and a hybrid optimization procedure to integrate the PALF and SA. Furthermore, an associated global register allocation strategy is presented and discussed. 3. Advanced subjects to support generating optimized code for PAC DSP architectures are also discussed in this dissertation and preliminarily developed in our compilation infrastructure. The results of all experiments performed using our optimizing compiler based on the Open Research Compiler (Open64/ORC), showed significant performance improvement over the primitive code generation. Our preliminary experimental results also indicate that our developed compiler can efficiently utilize the features of the specific register file architectures and irregular designs in the PAC DSP.