LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS

The sliding window dictionary-based algorithms of the Lempel-Ziv (LZ) 77 family are widely used for universal lossless data compression. The encoding component of these algorithms performs repeated substring search. Data structures, such as hash tables, binary search trees, and suffix trees have bee...

Full description

Bibliographic Details
Main Authors: Artur Ferreira, Arlindo Oliveira, Mario Figueiredo
Format: Article
Language:English
Published: Instituto Superior de Engenharia de Lisboa (ISEL) 2013-06-01
Series:ISEL Academic Journal of Electronics, Telecommunications and Computers
Subjects:
Online Access:http://journals.isel.pt/index.php/i-ETC/article/view/6
Description
Summary:The sliding window dictionary-based algorithms of the Lempel-Ziv (LZ) 77 family are widely used for universal lossless data compression. The encoding component of these algorithms performs repeated substring search. Data structures, such as hash tables, binary search trees, and suffix trees have been used to speedup these searches, at the expense of memory usage. Previous work has shown how suffix arrays (SA) can be used for dictionary representation and LZ77 decomposition. In this paper, we improve over that work by proposing a new efficient algorithm to update the sliding window each time a token is produced at the output. The proposed algorithm toggles between two SA on consecutive tokens. The resulting SA-based encoder requires less memory than the conventional tree-based encoders. In comparing our SA-based technique against tree-based encoders, on a large set of benchmark files, we find that, in some compression settings, our encoder is also faster than tree-based encoders.
ISSN:2182-4010