A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores
Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. In the asymptotic limit of long sequences, the Karlin-Altschul model computes a P-value assuming that the number of high scoring...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
SAGE Publishing
2008-01-01
|
Series: | Evolutionary Bioinformatics |
Subjects: | |
Online Access: | http://la-press.com/article.php?article_id=563 |
id |
doaj-8c9d0687a9c6438080574559f24a27af |
---|---|
record_format |
Article |
spelling |
doaj-8c9d0687a9c6438080574559f24a27af2020-11-25T03:28:22ZengSAGE PublishingEvolutionary Bioinformatics1176-93432008-01-0144145A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment ScoresOlivier BastienConfidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. In the asymptotic limit of long sequences, the Karlin-Altschul model computes a P-value assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Using a simple approach combined with recent results in reliability theory, we demonstrate here that the Karlin-Altshul model can be derived with no reference to the extreme events theory. Sequences were considered as systems in which components are amino acids and having a high redundancy of Information reflected by their alignment scores. Evolution of the information shared between aligned components determined the Shared Amount of Information (SA.I.) between sequences, i.e. the score. The Gumbel distribution parameters of aligned sequences scores find here some theoretical rationale. The first is the Hazard Rate of the distribution of scores between residues and the second is the probability that two aligned residues do not lose bits of information (i.e. conserve an initial pairing score) when a mutation occurs.http://la-press.com/article.php?article_id=563conservation functionreliability theoryKarlin-Altshul theorem |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Olivier Bastien |
spellingShingle |
Olivier Bastien A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores Evolutionary Bioinformatics conservation function reliability theory Karlin-Altshul theorem |
author_facet |
Olivier Bastien |
author_sort |
Olivier Bastien |
title |
A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores |
title_short |
A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores |
title_full |
A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores |
title_fullStr |
A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores |
title_full_unstemmed |
A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores |
title_sort |
simple derivation of the distribution of pairwise local protein sequence alignment scores |
publisher |
SAGE Publishing |
series |
Evolutionary Bioinformatics |
issn |
1176-9343 |
publishDate |
2008-01-01 |
description |
Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. In the asymptotic limit of long sequences, the Karlin-Altschul model computes a P-value assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Using a simple approach combined with recent results in reliability theory, we demonstrate here that the Karlin-Altshul model can be derived with no reference to the extreme events theory. Sequences were considered as systems in which components are amino acids and having a high redundancy of Information reflected by their alignment scores. Evolution of the information shared between aligned components determined the Shared Amount of Information (SA.I.) between sequences, i.e. the score. The Gumbel distribution parameters of aligned sequences scores find here some theoretical rationale. The first is the Hazard Rate of the distribution of scores between residues and the second is the probability that two aligned residues do not lose bits of information (i.e. conserve an initial pairing score) when a mutation occurs. |
topic |
conservation function reliability theory Karlin-Altshul theorem |
url |
http://la-press.com/article.php?article_id=563 |
work_keys_str_mv |
AT olivierbastien asimplederivationofthedistributionofpairwiselocalproteinsequencealignmentscores AT olivierbastien simplederivationofthedistributionofpairwiselocalproteinsequencealignmentscores |
_version_ |
1724584669397123072 |