Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise

<p>Abstract</p> <p>Background</p> <p>The estimation of a distance between two biological sequences is a fundamental process in molecular evolution. It is usually performed by maximum likelihood (ML) on characters aligned either pairwise or jointly in a multiple sequence...

Full description

Bibliographic Details
Main Authors: Dessimoz Christophe, Gil Manuel
Format: Article
Language:English
Published: BMC 2008-06-01
Series:BMC Evolutionary Biology
Online Access:http://www.biomedcentral.com/1471-2148/8/179
id doaj-94e89f56536e427386dab08c501eb9c7
record_format Article
spelling doaj-94e89f56536e427386dab08c501eb9c72021-09-02T07:40:36ZengBMCBMC Evolutionary Biology1471-21482008-06-018117910.1186/1471-2148-8-179Covariance of maximum likelihood evolutionary distances between sequences aligned pairwiseDessimoz ChristopheGil Manuel<p>Abstract</p> <p>Background</p> <p>The estimation of a distance between two biological sequences is a fundamental process in molecular evolution. It is usually performed by maximum likelihood (ML) on characters aligned either pairwise or jointly in a multiple sequence alignment (MSA). Estimators for the covariance of pairs from an MSA are known, but we are not aware of any solution for cases of pairs aligned independently. In large-scale analyses, it may be too costly to compute MSAs every time distances must be compared, and therefore a covariance estimator for distances estimated from pairs aligned independently is desirable. Knowledge of covariances improves any process that compares or combines distances, such as in generalized least-squares phylogenetic tree building, orthology inference, or lateral gene transfer detection.</p> <p>Results</p> <p>In this paper, we introduce an estimator for the covariance of distances from sequences aligned pairwise. Its performance is analyzed through extensive Monte Carlo simulations, and compared to the well-known variance estimator of ML distances. Our covariance estimator can be used together with the ML variance estimator to form covariance matrices.</p> <p>Conclusion</p> <p>The estimator performs similarly to the ML variance estimator. In particular, it shows no sign of bias when sequence divergence is below 150 PAM units (i.e. above ~29% expected sequence identity). Above that distance, the covariances tend to be underestimated, but then ML variances are also underestimated.</p> http://www.biomedcentral.com/1471-2148/8/179
collection DOAJ
language English
format Article
sources DOAJ
author Dessimoz Christophe
Gil Manuel
spellingShingle Dessimoz Christophe
Gil Manuel
Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise
BMC Evolutionary Biology
author_facet Dessimoz Christophe
Gil Manuel
author_sort Dessimoz Christophe
title Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise
title_short Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise
title_full Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise
title_fullStr Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise
title_full_unstemmed Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise
title_sort covariance of maximum likelihood evolutionary distances between sequences aligned pairwise
publisher BMC
series BMC Evolutionary Biology
issn 1471-2148
publishDate 2008-06-01
description <p>Abstract</p> <p>Background</p> <p>The estimation of a distance between two biological sequences is a fundamental process in molecular evolution. It is usually performed by maximum likelihood (ML) on characters aligned either pairwise or jointly in a multiple sequence alignment (MSA). Estimators for the covariance of pairs from an MSA are known, but we are not aware of any solution for cases of pairs aligned independently. In large-scale analyses, it may be too costly to compute MSAs every time distances must be compared, and therefore a covariance estimator for distances estimated from pairs aligned independently is desirable. Knowledge of covariances improves any process that compares or combines distances, such as in generalized least-squares phylogenetic tree building, orthology inference, or lateral gene transfer detection.</p> <p>Results</p> <p>In this paper, we introduce an estimator for the covariance of distances from sequences aligned pairwise. Its performance is analyzed through extensive Monte Carlo simulations, and compared to the well-known variance estimator of ML distances. Our covariance estimator can be used together with the ML variance estimator to form covariance matrices.</p> <p>Conclusion</p> <p>The estimator performs similarly to the ML variance estimator. In particular, it shows no sign of bias when sequence divergence is below 150 PAM units (i.e. above ~29% expected sequence identity). Above that distance, the covariances tend to be underestimated, but then ML variances are also underestimated.</p>
url http://www.biomedcentral.com/1471-2148/8/179
work_keys_str_mv AT dessimozchristophe covarianceofmaximumlikelihoodevolutionarydistancesbetweensequencesalignedpairwise
AT gilmanuel covarianceofmaximumlikelihoodevolutionarydistancesbetweensequencesalignedpairwise
_version_ 1721178322538332160