VLSI Architecture Design of Prediction Core and Cache in Super High Definition H.264/AVC Encoder

碩士 === 國立臺灣大學 === 電子工程學研究所 === 96 === With the progress of video technology, the image resolution is getting finer, and this directly contributes to the video quality. From VCD to high definition (HD) contents, the video quality stays on a fast growing track. In the foreseeable future, camcorders an...

Full description

Bibliographic Details
Main Authors: Wei-Yin Chen, 陳威尹
Other Authors: Liang-Gee Chen
Format: Others
Language:en_US
Online Access:http://ndltd.ncl.edu.tw/handle/fb7kwe
Description
Summary:碩士 === 國立臺灣大學 === 電子工程學研究所 === 96 === With the progress of video technology, the image resolution is getting finer, and this directly contributes to the video quality. From VCD to high definition (HD) contents, the video quality stays on a fast growing track. In the foreseeable future, camcorders and display devices with HD or super HD capability will make this trend keep going. Moreover, the vividness and the immersive perceptual experience brought by multi-view and stereo video are also irresistible. In the recent development of video coding standards, multi-view video coding (MVC) extends H.264/AVC and supports inter-view prediction to further reduce the data redundancy between different views. Since the majority of functional blocks in MVC resemble those in H.264/AVC, it is possible to implement a multi-standard video encoder for both super high definition video and multi-view video of similar throughput without much overhead. However, with greater video throughput comes greater burden for the encoders, and this burden is difficult to overcome in currently available architectures. In order to solve this challenge, a VLSI architecture design of super high definition H.264/AVC encoder with a ultra high throughput prediction core and an efficient cache system is proposed in this thesis. In a video encoder, the most computation and bandwidth requirements are caused by the prediction core, and integer motion estimation (IME) alone costs more than half the resources. In our target specification (Super HD 4k x 2k), the computation and bandwidth are orders of magnitude beyond the acceptable range, and the silicon real-estate for on-chip SRAM is far from affordable. A hardware-oriented fast IME algorithm with sophisticated data reuse and refinement center decision is proposed, and 96% computation is saved at expense of only 0.013 dB PSNR drop on average. With simplified half-pel interpolation, the memory bandwidth of FME reduces by 31% and the quality drop is only 0.03 dB. Interleaved double current frame scheme exploits thread parallelism in the entropy coder. This solves the throughput bottleneck in CABAC and achieves 1.2G symbols per second. An efficient cache system is proposed as the reference frame buffer, which occupies smaller on-chip memory size and consumes lower external bandwidth. The main challenge of cache design for super high definition H.264 encoder includes miss rate, miss penalty, overhead of data prefetching, and requirement of high throughput. In the proposed prefetching algorithm, rapid prefetching patterns and the priority-based replacement policy reduce the miss rate in data reading and the overhead of data prefetching. The proposed 4-way non-blocking cache architecture with concurrent data prefetching further reduces the miss penalty and supports throughput of 5 words per cycle with no penalty of cache line split. The average cache miss rate is decreased by 93%, thus the average cache hit rate is greater than 99.7%. Compared with the prior art in ISSCC ''08, the proposed cache architecture requires 82% less chip area and 39% less external memory bandwidth while supports video resolution more than four times higher. The proposed design in this thesis is implemented in TSMC 90 nm technology and works at 300 MHz. It is the first H.264/AVC video encoding chip that supports Super HD 4096 x 2160p resolution with 24 fps real-time performance. Furthermore, it can be reconfigured to support the MVC format with world-record throughput of three-view 1920 x 1080, 30 fps on a single chip. Therefore, the video coding technology is one step closer to the ultimate goal.