VLSI ARCHITECTURE DESIGN FOR BIT-LEVEL INNER PRODUCT

博士 === 國立交通大學 === 電子工程系 === 88 === Inner product is an important building block to many DSP applications such as multimedia, wireless and communication systems. Due to the wide range of applications, the study on efficient implementations to meet different application requirements becomes an importa...

Full description

Bibliographic Details
Main Authors: Tian-Sheuan Chang, 張添烜
Other Authors: Prof. Chein-Wei Jen
Format: Others
Language:zh-TW
Published: 1999
Online Access:http://ndltd.ncl.edu.tw/handle/68111050957313296182
Description
Summary:博士 === 國立交通大學 === 電子工程系 === 88 === Inner product is an important building block to many DSP applications such as multimedia, wireless and communication systems. Due to the wide range of applications, the study on efficient implementations to meet different application requirements becomes an important research topic. In this dissertation, we study this topic by exploring the bit-level design space of inner product, including both programmable and non-programmable operands. For non-programmable inner product, we explore its design space by considering the constant and the numerical property of the fixed operands such that the resulting multiplication is a hardwired one with common subexpression sharing. Thus, we propose a new distributed arithmetic (DA) technique that expands the fixed input into bit level so that we can take advantage of shared partial sum-of-products and sparse nonzero bits in the fixed input to reduce the number of computations. The proposed DA has been applied to a 2-D IDCT chip design, a processor core design, and FPGA implementations. The processor core design, which can be used in digital still camera and real time H.263 encoding, explores the sharing properties of the proposed DA to the extreme case: only one word adder and shifter. Furthermore, it may combine the fast direct 2-D DCT algorithm to reduce the computation cycles. The resulting architecture is quite simple, regular and easily scalable to other higher throughput applications. For FPGA implementations, due to its bit level grain size, the design with well-suited proposed DA can offer savings in excess of two-thirds of hardware cost, when comparing with the design by using conventional DA. Besides architecture optimization with common subexpression sharing, we also consider the algorithm reformulation. The algorithm reformulation formulates transform equations into cyclic convolution form to enable better sharing with common subexpression. We have proposed two efficient DFT designs that also combine the symmetry property of DFT coefficients to increase the resulting throughput. The prime-length DFT design can save 80% of gate area with two-times fast of throughput for length N=61. The power-of-two length DFT design achieves competitive area-time complexity comparing with previous designs. For portable applications, we also consider low power filter realization by using differential coefficients and inputs instead of using them directly such that fewer bits are required thereby reducing the size of arithmetic units and power dissipation. We present an improved algorithm to effectively generate differential coefficients so that the differential coefficients methods can be applied to full bandwidth of filters instead of only narrow band filters in previous approaches. Simulations with fixed coefficient filters indicate reduction in transition activity ranging from 1% to 53% over the full range of filter bandwidths. Reduction in area can be up to 50% due to less coefficient precision. The resulting design is superior to the one with previous approaches in applicability, power consumption, and area. For programmable filters, we present a digit-serial architecture that uses DA form in the algorithm level for accumulation-free operations, and (p, q) compressor instead of Booth encoding for high-speed operations. The resulting design can save up to 17% hardware cost comparing with the previous approach.