Linear-Time Text Compression by Longest-First Substitution

We consider grammar-based text compression with longest first substitution (LFS), where non-overlapping occurrences of a longest repeating factor of the input text are replaced by a new non-terminal symbol. We present the first linear-time algorithm for LFS. Our algorithm employs a new data structur...

Full description

Bibliographic Details
Main Authors: Ayumi Shinohara, Masayuki Takeda, Hideo Bannai, Takashi Funamoto, Ryosuke Nakamura, Shunsuke Inenaga
Format: Article
Language:English
Published: MDPI AG 2009-11-01
Series:Algorithms
Subjects:
Online Access:http://www.mdpi.com/1999-4893/2/4/1429/
id doaj-b534caaf9f6e40579b516b611fadc54b
record_format Article
spelling doaj-b534caaf9f6e40579b516b611fadc54b2020-11-24T23:18:31ZengMDPI AGAlgorithms1999-48932009-11-01241429144810.3390/a2041429Linear-Time Text Compression by Longest-First SubstitutionAyumi ShinoharaMasayuki TakedaHideo BannaiTakashi FunamotoRyosuke NakamuraShunsuke InenagaWe consider grammar-based text compression with longest first substitution (LFS), where non-overlapping occurrences of a longest repeating factor of the input text are replaced by a new non-terminal symbol. We present the first linear-time algorithm for LFS. Our algorithm employs a new data structure called sparse lazy suffix trees. We also deal with a more sophisticated version of LFS, called LFS2, that allows better compression. The first linear-time algorithm for LFS2 is also presented. http://www.mdpi.com/1999-4893/2/4/1429/grammar-based text compressionsuffix treeslinear-time algorithms
collection DOAJ
language English
format Article
sources DOAJ
author Ayumi Shinohara
Masayuki Takeda
Hideo Bannai
Takashi Funamoto
Ryosuke Nakamura
Shunsuke Inenaga
spellingShingle Ayumi Shinohara
Masayuki Takeda
Hideo Bannai
Takashi Funamoto
Ryosuke Nakamura
Shunsuke Inenaga
Linear-Time Text Compression by Longest-First Substitution
Algorithms
grammar-based text compression
suffix trees
linear-time algorithms
author_facet Ayumi Shinohara
Masayuki Takeda
Hideo Bannai
Takashi Funamoto
Ryosuke Nakamura
Shunsuke Inenaga
author_sort Ayumi Shinohara
title Linear-Time Text Compression by Longest-First Substitution
title_short Linear-Time Text Compression by Longest-First Substitution
title_full Linear-Time Text Compression by Longest-First Substitution
title_fullStr Linear-Time Text Compression by Longest-First Substitution
title_full_unstemmed Linear-Time Text Compression by Longest-First Substitution
title_sort linear-time text compression by longest-first substitution
publisher MDPI AG
series Algorithms
issn 1999-4893
publishDate 2009-11-01
description We consider grammar-based text compression with longest first substitution (LFS), where non-overlapping occurrences of a longest repeating factor of the input text are replaced by a new non-terminal symbol. We present the first linear-time algorithm for LFS. Our algorithm employs a new data structure called sparse lazy suffix trees. We also deal with a more sophisticated version of LFS, called LFS2, that allows better compression. The first linear-time algorithm for LFS2 is also presented.
topic grammar-based text compression
suffix trees
linear-time algorithms
url http://www.mdpi.com/1999-4893/2/4/1429/
work_keys_str_mv AT ayumishinohara lineartimetextcompressionbylongestfirstsubstitution
AT masayukitakeda lineartimetextcompressionbylongestfirstsubstitution
AT hideobannai lineartimetextcompressionbylongestfirstsubstitution
AT takashifunamoto lineartimetextcompressionbylongestfirstsubstitution
AT ryosukenakamura lineartimetextcompressionbylongestfirstsubstitution
AT shunsukeinenaga lineartimetextcompressionbylongestfirstsubstitution
_version_ 1725581107583254528