A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores

Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. In the asymptotic limit of long sequences, the Karlin-Altschul model computes a P-value assuming that the number of high scoring...

Full description

Bibliographic Details
Main Author: Olivier Bastien
Format: Article
Language:English
Published: SAGE Publishing 2008-01-01
Series:Evolutionary Bioinformatics
Online Access:https://doi.org/10.1177/117693430800400001
id doaj-3bf83f47bd824b7195e489b7068a8e74
record_format Article
spelling doaj-3bf83f47bd824b7195e489b7068a8e742020-11-25T03:17:51ZengSAGE PublishingEvolutionary Bioinformatics1176-93432008-01-01410.1177/117693430800400001A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment ScoresOlivier Bastien0CNRS (UMR 5168) - INRA (UMR 1200) - CEA - Université Joseph Fourier, Laboratoire de Physiologie Cellulaire Végétale; CEA Grenoble, 17 rue des Martyrs, F-38054, Grenoble cedex 09, France.Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. In the asymptotic limit of long sequences, the Karlin-Altschul model computes a P-value assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Using a simple approach combined with recent results in reliability theory, we demonstrate here that the Karlin-Altshul model can be derived with no reference to the extreme events theory. Sequences were considered as systems in which components are amino acids and having a high redundancy of Information reflected by their alignment scores. Evolution of the information shared between aligned components determined the Shared Amount of Information (SA.I.) between sequences, i.e. the score. The Gumbel distribution parameters of aligned sequences scores find here some theoretical rationale. The first is the Hazard Rate of the distribution of scores between residues and the second is the probability that two aligned residues do not lose bits of information (i.e. conserve an initial pairing score) when a mutation occurs.https://doi.org/10.1177/117693430800400001
collection DOAJ
language English
format Article
sources DOAJ
author Olivier Bastien
spellingShingle Olivier Bastien
A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores
Evolutionary Bioinformatics
author_facet Olivier Bastien
author_sort Olivier Bastien
title A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores
title_short A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores
title_full A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores
title_fullStr A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores
title_full_unstemmed A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores
title_sort simple derivation of the distribution of pairwise local protein sequence alignment scores
publisher SAGE Publishing
series Evolutionary Bioinformatics
issn 1176-9343
publishDate 2008-01-01
description Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. In the asymptotic limit of long sequences, the Karlin-Altschul model computes a P-value assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Using a simple approach combined with recent results in reliability theory, we demonstrate here that the Karlin-Altshul model can be derived with no reference to the extreme events theory. Sequences were considered as systems in which components are amino acids and having a high redundancy of Information reflected by their alignment scores. Evolution of the information shared between aligned components determined the Shared Amount of Information (SA.I.) between sequences, i.e. the score. The Gumbel distribution parameters of aligned sequences scores find here some theoretical rationale. The first is the Hazard Rate of the distribution of scores between residues and the second is the probability that two aligned residues do not lose bits of information (i.e. conserve an initial pairing score) when a mutation occurs.
url https://doi.org/10.1177/117693430800400001
work_keys_str_mv AT olivierbastien asimplederivationofthedistributionofpairwiselocalproteinsequencealignmentscores
AT olivierbastien simplederivationofthedistributionofpairwiselocalproteinsequencealignmentscores
_version_ 1724629543343357952