An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms

We propose a framework for the exact probabilistic analysis of window-based pattern matching algorithms, such as Boyer–Moore, Horspool, Backward DAWG Matching, Backward Oracle Matching, and more. In particular, we develop an algorithm that efficiently computes the distribution of a pattern matching...

Full description

Bibliographic Details
Main Authors: Sven Rahmann, Tobias Marschall
Format: Article
Language:English
Published: MDPI AG 2011-10-01
Series:Algorithms
Subjects:
Online Access:http://www.mdpi.com/1999-4893/4/4/285/
id doaj-bc37dcc268604f73989f74ddb34ad9cc
record_format Article
spelling doaj-bc37dcc268604f73989f74ddb34ad9cc2020-11-24T21:18:23ZengMDPI AGAlgorithms1999-48932011-10-014428530610.3390/a4040285An Algorithm to Compute the Character Access Count Distribution for Pattern Matching AlgorithmsSven RahmannTobias MarschallWe propose a framework for the exact probabilistic analysis of window-based pattern matching algorithms, such as Boyer–Moore, Horspool, Backward DAWG Matching, Backward Oracle Matching, and more. In particular, we develop an algorithm that efficiently computes the distribution of a pattern matching algorithm’s running time cost (such as the number of text character accesses) for any given pattern in a random text model. Text models range from simple uniform models to higher-order Markov models or hidden Markov models (HMMs). Furthermore, we provide an algorithm to compute the exact distribution of differences in running time cost of two pattern matching algorithms. Methodologically, we use extensions of finite automata which we call deterministic arithmetic automata (DAAs) and probabilistic arithmetic automata (PAAs) [1]. Given an algorithm, a pattern, and a text model, a PAA is constructed from which the sought distributions can be derived using dynamic programming. To our knowledge, this is the first time that substring- or suffix-based pattern matching algorithms are analyzed exactly by computing the whole distribution of running time cost. Experimentally, we compare Horspool’s algorithm, Backward DAWG Matching, and Backward Oracle Matching on prototypical patterns of short length and provide statistics on the size of minimal DAAs for these computations.http://www.mdpi.com/1999-4893/4/4/285/pattern matchinganalysis of algorithmsfinite automatonminimizationdeterministic arithmetic automatonprobabilistic arithmetic automaton
collection DOAJ
language English
format Article
sources DOAJ
author Sven Rahmann
Tobias Marschall
spellingShingle Sven Rahmann
Tobias Marschall
An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms
Algorithms
pattern matching
analysis of algorithms
finite automaton
minimization
deterministic arithmetic automaton
probabilistic arithmetic automaton
author_facet Sven Rahmann
Tobias Marschall
author_sort Sven Rahmann
title An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms
title_short An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms
title_full An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms
title_fullStr An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms
title_full_unstemmed An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms
title_sort algorithm to compute the character access count distribution for pattern matching algorithms
publisher MDPI AG
series Algorithms
issn 1999-4893
publishDate 2011-10-01
description We propose a framework for the exact probabilistic analysis of window-based pattern matching algorithms, such as Boyer–Moore, Horspool, Backward DAWG Matching, Backward Oracle Matching, and more. In particular, we develop an algorithm that efficiently computes the distribution of a pattern matching algorithm’s running time cost (such as the number of text character accesses) for any given pattern in a random text model. Text models range from simple uniform models to higher-order Markov models or hidden Markov models (HMMs). Furthermore, we provide an algorithm to compute the exact distribution of differences in running time cost of two pattern matching algorithms. Methodologically, we use extensions of finite automata which we call deterministic arithmetic automata (DAAs) and probabilistic arithmetic automata (PAAs) [1]. Given an algorithm, a pattern, and a text model, a PAA is constructed from which the sought distributions can be derived using dynamic programming. To our knowledge, this is the first time that substring- or suffix-based pattern matching algorithms are analyzed exactly by computing the whole distribution of running time cost. Experimentally, we compare Horspool’s algorithm, Backward DAWG Matching, and Backward Oracle Matching on prototypical patterns of short length and provide statistics on the size of minimal DAAs for these computations.
topic pattern matching
analysis of algorithms
finite automaton
minimization
deterministic arithmetic automaton
probabilistic arithmetic automaton
url http://www.mdpi.com/1999-4893/4/4/285/
work_keys_str_mv AT svenrahmann analgorithmtocomputethecharacteraccesscountdistributionforpatternmatchingalgorithms
AT tobiasmarschall analgorithmtocomputethecharacteraccesscountdistributionforpatternmatchingalgorithms
AT svenrahmann algorithmtocomputethecharacteraccesscountdistributionforpatternmatchingalgorithms
AT tobiasmarschall algorithmtocomputethecharacteraccesscountdistributionforpatternmatchingalgorithms
_version_ 1726009603321233408