Acceleration and Implementation of JPEG2000 Encoder on TI DSP Platform

碩士 === 國立交通大學 === 電機學院IC設計產業專班 === 95 === Because the usage for digital imagery gets increasingly popular, to enhance the compressed image efficiency and features, a new still image coding standard called JPEG2000 was proposed. It provides an excellent subjective quality at low bit rates. It also of...

Full description

Bibliographic Details
Main Author: 劉建志
Other Authors: 杭學鳴
Format: Others
Language:en_US
Published: 2006
Online Access:http://ndltd.ncl.edu.tw/handle/40058944042061936793
Description
Summary:碩士 === 國立交通大學 === 電機學院IC設計產業專班 === 95 === Because the usage for digital imagery gets increasingly popular, to enhance the compressed image efficiency and features, a new still image coding standard called JPEG2000 was proposed. It provides an excellent subjective quality at low bit rates. It also offers fine granularity scalability in compression efficiency and transmitting compressed bit stream. However, JPEG2000 is also very complicated in computational complexity. In this thesis, we implement a JPEG2000 encoder on the TI DSP platform. We propose two speed-up methods and use the TI DSP optimization tools to accelerate the Tier1 module, which is the most complex part in the JPEG2000 standard. We start with the ver.1.0 OpenJPEG reference software, which has adopted the 1-D lifting scheme to accelerate the DWT module. Thus we focus on the Tier1 module, which takes about 90% of total computing time. We study the previous methods first and examine their effectiveness on our DSP platform. Then, we propose two improved methods, one is called VGOSS (Variable Group Of Sample Skip), and the other is a modified VGOSS method. We eliminate the unnecessary checking cycles by recording the NBC (Need-to-Be-Coded) samples on a list. Furthermore, the sample index is reordered to facilitate fast execution. In the DSP implementation of the proposed methods, we use code acceleration techniques and DSP compiler-level optimization. We also tune the cache allocation to reduce memory access time. The experimental results show that the best performance is up to 32 times faster than the original program without any optimization on the DSP platform. If the original program is compiled with the DSP optimization tools and proper cache assignment, our fast algorithm can still reduce the computation by 45%.