Mining, compressing and classifying with extensible motifs

<p>Abstract</p> <p>Background</p> <p>Motif patterns of maximal saturation emerged originally in contexts of pattern discovery in biomolecular sequences and have recently proven a valuable notion also in the design of data compression schemes. Informally, a motif is a st...

Full description

Bibliographic Details
Main Authors: Comin Matteo, Apostolico Alberto, Parida Laxmi
Format: Article
Language:English
Published: BMC 2006-03-01
Series:Algorithms for Molecular Biology
Online Access:http://www.almob.org/content/1/1/4
id doaj-912693d7f82d4980af64422d28f87823
record_format Article
spelling doaj-912693d7f82d4980af64422d28f878232020-11-24T21:09:26ZengBMCAlgorithms for Molecular Biology1748-71882006-03-0111410.1186/1748-7188-1-4Mining, compressing and classifying with extensible motifsComin MatteoApostolico AlbertoParida Laxmi<p>Abstract</p> <p>Background</p> <p>Motif patterns of maximal saturation emerged originally in contexts of pattern discovery in biomolecular sequences and have recently proven a valuable notion also in the design of data compression schemes. Informally, a motif is a string of intermittently solid and wild characters that recurs more or less frequently in an input sequence or family of sequences. Motif discovery techniques and tools tend to be computationally imposing, however, special classes of "rigid" motifs have been identified of which the discovery is affordable in low polynomial time.</p> <p>Results</p> <p>In the present work, "extensible" motifs are considered such that each sequence of gaps comes endowed with some elasticity, whereby the same pattern may be stretched to fit segments of the source that match all the solid characters but are otherwise of different lengths. A few applications of this notion are then described. In applications of data compression by textual substitution, extensible motifs are seen to bring savings on the size of the codebook, and hence to improve compression. In germane contexts, in which compressibility is used in its dual role as a basis for structural inference and classification, extensible motifs are seen to support unsupervised classification and phylogeny reconstruction.</p> <p>Conclusion</p> <p>Off-line compression based on extensible motifs can be used advantageously to compress and classify biological sequences.</p> http://www.almob.org/content/1/1/4
collection DOAJ
language English
format Article
sources DOAJ
author Comin Matteo
Apostolico Alberto
Parida Laxmi
spellingShingle Comin Matteo
Apostolico Alberto
Parida Laxmi
Mining, compressing and classifying with extensible motifs
Algorithms for Molecular Biology
author_facet Comin Matteo
Apostolico Alberto
Parida Laxmi
author_sort Comin Matteo
title Mining, compressing and classifying with extensible motifs
title_short Mining, compressing and classifying with extensible motifs
title_full Mining, compressing and classifying with extensible motifs
title_fullStr Mining, compressing and classifying with extensible motifs
title_full_unstemmed Mining, compressing and classifying with extensible motifs
title_sort mining, compressing and classifying with extensible motifs
publisher BMC
series Algorithms for Molecular Biology
issn 1748-7188
publishDate 2006-03-01
description <p>Abstract</p> <p>Background</p> <p>Motif patterns of maximal saturation emerged originally in contexts of pattern discovery in biomolecular sequences and have recently proven a valuable notion also in the design of data compression schemes. Informally, a motif is a string of intermittently solid and wild characters that recurs more or less frequently in an input sequence or family of sequences. Motif discovery techniques and tools tend to be computationally imposing, however, special classes of "rigid" motifs have been identified of which the discovery is affordable in low polynomial time.</p> <p>Results</p> <p>In the present work, "extensible" motifs are considered such that each sequence of gaps comes endowed with some elasticity, whereby the same pattern may be stretched to fit segments of the source that match all the solid characters but are otherwise of different lengths. A few applications of this notion are then described. In applications of data compression by textual substitution, extensible motifs are seen to bring savings on the size of the codebook, and hence to improve compression. In germane contexts, in which compressibility is used in its dual role as a basis for structural inference and classification, extensible motifs are seen to support unsupervised classification and phylogeny reconstruction.</p> <p>Conclusion</p> <p>Off-line compression based on extensible motifs can be used advantageously to compress and classify biological sequences.</p>
url http://www.almob.org/content/1/1/4
work_keys_str_mv AT cominmatteo miningcompressingandclassifyingwithextensiblemotifs
AT apostolicoalberto miningcompressingandclassifyingwithextensiblemotifs
AT paridalaxmi miningcompressingandclassifyingwithextensiblemotifs
_version_ 1716758359723474944