LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS

The sliding window dictionary-based algorithms of the Lempel-Ziv (LZ) 77 family are widely used for universal lossless data compression. The encoding component of these algorithms performs repeated substring search. Data structures, such as hash tables, binary search trees, and suffix trees have bee...

Full description

Bibliographic Details
Main Authors: Artur Ferreira, Arlindo Oliveira, Mario Figueiredo
Format: Article
Language:English
Published: Instituto Superior de Engenharia de Lisboa (ISEL) 2013-06-01
Series:ISEL Academic Journal of Electronics, Telecommunications and Computers
Subjects:
Online Access:http://journals.isel.pt/index.php/i-ETC/article/view/6
id doaj-9ca6c1cce13e4a318486b5931d64bac2
record_format Article
spelling doaj-9ca6c1cce13e4a318486b5931d64bac22020-11-25T00:59:06ZengInstituto Superior de Engenharia de Lisboa (ISEL)ISEL Academic Journal of Electronics, Telecommunications and Computers2182-40102013-06-01216LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYSArtur FerreiraArlindo OliveiraMario FigueiredoThe sliding window dictionary-based algorithms of the Lempel-Ziv (LZ) 77 family are widely used for universal lossless data compression. The encoding component of these algorithms performs repeated substring search. Data structures, such as hash tables, binary search trees, and suffix trees have been used to speedup these searches, at the expense of memory usage. Previous work has shown how suffix arrays (SA) can be used for dictionary representation and LZ77 decomposition. In this paper, we improve over that work by proposing a new efficient algorithm to update the sliding window each time a token is produced at the output. The proposed algorithm toggles between two SA on consecutive tokens. The resulting SA-based encoder requires less memory than the conventional tree-based encoders. In comparing our SA-based technique against tree-based encoders, on a large set of benchmark files, we find that, in some compression settings, our encoder is also faster than tree-based encoders.http://journals.isel.pt/index.php/i-ETC/article/view/6Lempel-Ziv compressionsuffix arrayssliding window updatesubstring search
collection DOAJ
language English
format Article
sources DOAJ
author Artur Ferreira
Arlindo Oliveira
Mario Figueiredo
spellingShingle Artur Ferreira
Arlindo Oliveira
Mario Figueiredo
LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS
ISEL Academic Journal of Electronics, Telecommunications and Computers
Lempel-Ziv compression
suffix arrays
sliding window update
substring search
author_facet Artur Ferreira
Arlindo Oliveira
Mario Figueiredo
author_sort Artur Ferreira
title LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS
title_short LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS
title_full LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS
title_fullStr LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS
title_full_unstemmed LEMPEL-ZIV SLIDING WINDOW UPDATE WITH SUFFIX ARRAYS
title_sort lempel-ziv sliding window update with suffix arrays
publisher Instituto Superior de Engenharia de Lisboa (ISEL)
series ISEL Academic Journal of Electronics, Telecommunications and Computers
issn 2182-4010
publishDate 2013-06-01
description The sliding window dictionary-based algorithms of the Lempel-Ziv (LZ) 77 family are widely used for universal lossless data compression. The encoding component of these algorithms performs repeated substring search. Data structures, such as hash tables, binary search trees, and suffix trees have been used to speedup these searches, at the expense of memory usage. Previous work has shown how suffix arrays (SA) can be used for dictionary representation and LZ77 decomposition. In this paper, we improve over that work by proposing a new efficient algorithm to update the sliding window each time a token is produced at the output. The proposed algorithm toggles between two SA on consecutive tokens. The resulting SA-based encoder requires less memory than the conventional tree-based encoders. In comparing our SA-based technique against tree-based encoders, on a large set of benchmark files, we find that, in some compression settings, our encoder is also faster than tree-based encoders.
topic Lempel-Ziv compression
suffix arrays
sliding window update
substring search
url http://journals.isel.pt/index.php/i-ETC/article/view/6
work_keys_str_mv AT arturferreira lempelzivslidingwindowupdatewithsuffixarrays
AT arlindooliveira lempelzivslidingwindowupdatewithsuffixarrays
AT mariofigueiredo lempelzivslidingwindowupdatewithsuffixarrays
_version_ 1725218888451358720