A 64-QAM 8x8 MIMO Detector with Constant-Throughput Lattice Reduction and QR Preprocessing

博士 === 國立清華大學 === 電機工程學系 === 101 === Nowadays, the progress of wireless communication has become very fast. The growth of the dimension of the multiple-input multiple-output (MIMO) systems is also very fast due to the demand of high throughput applications. Therefore the need for a high-performance...

Full description

Bibliographic Details
Main Authors: Liao, Chun-Fu, 廖浚甫
Other Authors: Huang, Yuan-Hao
Format: Others
Language:en_US
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/60999550872395529318
id ndltd-TW-101NTHU5442065
record_format oai_dc
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立清華大學 === 電機工程學系 === 101 === Nowadays, the progress of wireless communication has become very fast. The growth of the dimension of the multiple-input multiple-output (MIMO) systems is also very fast due to the demand of high throughput applications. Therefore the need for a high-performance and low-complexity MIMO detector becomes an important issue. The maximum likelihood (ML) detector is known to be an optimal detector; however, it is impractical for realization owing to its great computational complexity. Addressing this problem, researchers have proposed tree-based search algorithms, such as sphere decoding and K-Best decoding, to reduce the complexity with near-optimal performance. On the other hand, channel matrix preprocessing technique, such as lattice-reduction-aided (LRA) detection, has been proposed to improve the MIMO detection performance with full diversity gain. Although, lots of researchers address the merit of the lattice reduction aided system, there are still lacking of VLSI implementation in the lattice reduction aided MIMO detection criterion. This thesis focus on implementation of a complete lattice reduction aided MIMO detection system, and there are total three chip implementations in order to accomplish this goal. Each chip is introduced with one chapter. The goal of the first chip is to implement the first constant throughput LLL lattice reduction processor. A variant LLL lattice reduction algorithm is proposed and implemented in 4 × 4 MIMO systems. The power is saved by using redundant operation prediction techniques. The power saving technique is valid in both algorithm and hardware aspect. The chip is implemented using UMC 90 1P9M technology, and it occupies 4.29 mm2 area including a 0.8 mm2 core area with 24.8 mW power comsumption at its maximum frequency 37MHz. The average reduction power of the Rayleigh-fading MIMO channel is 22.42% of the original power. The throughput of this processor is determined by choosing a certain stage number, and the stage number can also be chosen to have different performance requirement. The goal of second and third chip is to implement a complete lattice reduction aided MIMO detection system. Although there are some implementations of the LLL lattice reduction algorithm in the literature, they often neglect the QR decomposition before the LLL lattice reduction algorithm. Thus, the second chip implemented a joint QR decomposition and efficient constant throughput LLL lattice reduction algorithm. This chip uses several different functional blocks to support both QR and lattice reduction operation. There are above 80% hardware sharing of these two algorithms which greatly lower the hardware cost for implementing a whole preprocessing operation, and the utilization rate of these processing elements is all close to 80% at will. This means there are few idling circuits. The joint design of these two algorithms also lowers the word-length of the circuit. The proposed processor was designed and fabricated using TSMC 90nm 1P9M CMOS technology. The chip occupies a 5.211mm2 area, including a 2.505mm2 core area, and consumes 31.2 mW at its maximum frequency of 55 MHz. It is the first 8 × 8 realization of the lattice reduction processor. The third chip deals mainly with the detector part of the lattice reduction aided MIMO detection system. The preprocessing processor of the second work is also used in this chip. Using simple linear detector cannot have satisfied performance in 8 × 8 MIMO environment. However, the lattice reduction aided K-best detector has a much larger data range which will result in large hardware cost. The sorting operation of K-best detector also results in long latency and hardware cost. Therefore, the third work proposed a sorting-reduced K-best detector to greatly lower the sorting operation with small performance degradation. Differential value representation is also proposed to reduce the hardware cost of lattice reduction aided K-best detector. The bridge between preprocessing and detection is also implemented on this chip. The proposed design, which includes QR decomposition with full size reduction, the E-CTLLL LR algorithm, shifting and scaling circuits, projection circuits, and the SR K-best detector, was fabricated using the TSMC 90 nm 1P9M CMOS process. The chip occupies a 13.82 mm2 area, including a 7.94 mm2 core area, and consumes 37.1 mW at a frequency of 65 MHz. The proposed SR K-best detector alone can achieve a throughput of 3.1 Gbps when 64-QAM is applied, outperforming state-of-the-art methods. To estimate the throughput of the whole system, one channel is assumed to detect 72 symbols. Therefore, the estimated throughput is 585 Mbps for this chip, and the bottleneck is the three cycle projection operation of the preprocessing part. The energy per bit is 63 pj/bit which is also the lowest in the literature. Thus, this work is believed to have many contributions in the VLSI implementation of lattice reduction aided MIMO detection area.
author2 Huang, Yuan-Hao
author_facet Huang, Yuan-Hao
Liao, Chun-Fu
廖浚甫
author Liao, Chun-Fu
廖浚甫
spellingShingle Liao, Chun-Fu
廖浚甫
A 64-QAM 8x8 MIMO Detector with Constant-Throughput Lattice Reduction and QR Preprocessing
author_sort Liao, Chun-Fu
title A 64-QAM 8x8 MIMO Detector with Constant-Throughput Lattice Reduction and QR Preprocessing
title_short A 64-QAM 8x8 MIMO Detector with Constant-Throughput Lattice Reduction and QR Preprocessing
title_full A 64-QAM 8x8 MIMO Detector with Constant-Throughput Lattice Reduction and QR Preprocessing
title_fullStr A 64-QAM 8x8 MIMO Detector with Constant-Throughput Lattice Reduction and QR Preprocessing
title_full_unstemmed A 64-QAM 8x8 MIMO Detector with Constant-Throughput Lattice Reduction and QR Preprocessing
title_sort 64-qam 8x8 mimo detector with constant-throughput lattice reduction and qr preprocessing
publishDate 2013
url http://ndltd.ncl.edu.tw/handle/60999550872395529318
work_keys_str_mv AT liaochunfu a64qam8x8mimodetectorwithconstantthroughputlatticereductionandqrpreprocessing
AT liàojùnfǔ a64qam8x8mimodetectorwithconstantthroughputlatticereductionandqrpreprocessing
AT liaochunfu jùgùdìngyùnsuànliàngjīnggéjiǎnhuàyǔqrfēnjiěqiánchùlǐzhī64qam8x8duōshūrùduōshūchūzhēncèqì
AT liàojùnfǔ jùgùdìngyùnsuànliàngjīnggéjiǎnhuàyǔqrfēnjiěqiánchùlǐzhī64qam8x8duōshūrùduōshūchūzhēncèqì
AT liaochunfu 64qam8x8mimodetectorwithconstantthroughputlatticereductionandqrpreprocessing
AT liàojùnfǔ 64qam8x8mimodetectorwithconstantthroughputlatticereductionandqrpreprocessing
_version_ 1718077472372686848
spelling ndltd-TW-101NTHU54420652015-10-13T22:30:11Z http://ndltd.ncl.edu.tw/handle/60999550872395529318 A 64-QAM 8x8 MIMO Detector with Constant-Throughput Lattice Reduction and QR Preprocessing 具固定運算量晶格簡化與QR分解前處理之64-QAM 8x8多輸入多輸出偵測器 Liao, Chun-Fu 廖浚甫 博士 國立清華大學 電機工程學系 101 Nowadays, the progress of wireless communication has become very fast. The growth of the dimension of the multiple-input multiple-output (MIMO) systems is also very fast due to the demand of high throughput applications. Therefore the need for a high-performance and low-complexity MIMO detector becomes an important issue. The maximum likelihood (ML) detector is known to be an optimal detector; however, it is impractical for realization owing to its great computational complexity. Addressing this problem, researchers have proposed tree-based search algorithms, such as sphere decoding and K-Best decoding, to reduce the complexity with near-optimal performance. On the other hand, channel matrix preprocessing technique, such as lattice-reduction-aided (LRA) detection, has been proposed to improve the MIMO detection performance with full diversity gain. Although, lots of researchers address the merit of the lattice reduction aided system, there are still lacking of VLSI implementation in the lattice reduction aided MIMO detection criterion. This thesis focus on implementation of a complete lattice reduction aided MIMO detection system, and there are total three chip implementations in order to accomplish this goal. Each chip is introduced with one chapter. The goal of the first chip is to implement the first constant throughput LLL lattice reduction processor. A variant LLL lattice reduction algorithm is proposed and implemented in 4 × 4 MIMO systems. The power is saved by using redundant operation prediction techniques. The power saving technique is valid in both algorithm and hardware aspect. The chip is implemented using UMC 90 1P9M technology, and it occupies 4.29 mm2 area including a 0.8 mm2 core area with 24.8 mW power comsumption at its maximum frequency 37MHz. The average reduction power of the Rayleigh-fading MIMO channel is 22.42% of the original power. The throughput of this processor is determined by choosing a certain stage number, and the stage number can also be chosen to have different performance requirement. The goal of second and third chip is to implement a complete lattice reduction aided MIMO detection system. Although there are some implementations of the LLL lattice reduction algorithm in the literature, they often neglect the QR decomposition before the LLL lattice reduction algorithm. Thus, the second chip implemented a joint QR decomposition and efficient constant throughput LLL lattice reduction algorithm. This chip uses several different functional blocks to support both QR and lattice reduction operation. There are above 80% hardware sharing of these two algorithms which greatly lower the hardware cost for implementing a whole preprocessing operation, and the utilization rate of these processing elements is all close to 80% at will. This means there are few idling circuits. The joint design of these two algorithms also lowers the word-length of the circuit. The proposed processor was designed and fabricated using TSMC 90nm 1P9M CMOS technology. The chip occupies a 5.211mm2 area, including a 2.505mm2 core area, and consumes 31.2 mW at its maximum frequency of 55 MHz. It is the first 8 × 8 realization of the lattice reduction processor. The third chip deals mainly with the detector part of the lattice reduction aided MIMO detection system. The preprocessing processor of the second work is also used in this chip. Using simple linear detector cannot have satisfied performance in 8 × 8 MIMO environment. However, the lattice reduction aided K-best detector has a much larger data range which will result in large hardware cost. The sorting operation of K-best detector also results in long latency and hardware cost. Therefore, the third work proposed a sorting-reduced K-best detector to greatly lower the sorting operation with small performance degradation. Differential value representation is also proposed to reduce the hardware cost of lattice reduction aided K-best detector. The bridge between preprocessing and detection is also implemented on this chip. The proposed design, which includes QR decomposition with full size reduction, the E-CTLLL LR algorithm, shifting and scaling circuits, projection circuits, and the SR K-best detector, was fabricated using the TSMC 90 nm 1P9M CMOS process. The chip occupies a 13.82 mm2 area, including a 7.94 mm2 core area, and consumes 37.1 mW at a frequency of 65 MHz. The proposed SR K-best detector alone can achieve a throughput of 3.1 Gbps when 64-QAM is applied, outperforming state-of-the-art methods. To estimate the throughput of the whole system, one channel is assumed to detect 72 symbols. Therefore, the estimated throughput is 585 Mbps for this chip, and the bottleneck is the three cycle projection operation of the preprocessing part. The energy per bit is 63 pj/bit which is also the lowest in the literature. Thus, this work is believed to have many contributions in the VLSI implementation of lattice reduction aided MIMO detection area. Huang, Yuan-Hao 黃元豪 2013 學位論文 ; thesis 102 en_US