Reconstructing phylogenies from noisy quartets in polynomial time with a high success probability

<p>Abstract</p> <p>Background</p> <p>In recent years, quartet-based phylogeny reconstruction methods have received considerable attentions in the computational biology community. Traditionally, the accuracy of a phylogeny reconstruction method is measured by simulations...

Full description

Bibliographic Details
Main Authors: Wu Gang, Kao Ming-Yang, Lin Guohui, You Jia-Huai
Format: Article
Language:English
Published: BMC 2008-01-01
Series:Algorithms for Molecular Biology
Online Access:http://www.almob.org/content/3/1/1
id doaj-7b513dc298fa49198d456b447cc08635
record_format Article
spelling doaj-7b513dc298fa49198d456b447cc086352020-11-24T23:43:32ZengBMCAlgorithms for Molecular Biology1748-71882008-01-0131110.1186/1748-7188-3-1Reconstructing phylogenies from noisy quartets in polynomial time with a high success probabilityWu GangKao Ming-YangLin GuohuiYou Jia-Huai<p>Abstract</p> <p>Background</p> <p>In recent years, quartet-based phylogeny reconstruction methods have received considerable attentions in the computational biology community. Traditionally, the accuracy of a phylogeny reconstruction method is measured by simulations on synthetic datasets with known "true" phylogenies, while little theoretical analysis has been done. In this paper, we present a new model-based approach to measuring the accuracy of a quartet-based phylogeny reconstruction method. Under this model, we propose three efficient algorithms to reconstruct the "true" phylogeny with a high success probability.</p> <p>Results</p> <p>The first algorithm can reconstruct the "true" phylogeny from the input quartet topology set without quartet errors in <it>O</it>(<it>n</it><sup>2</sup>) time by querying at most (<it>n </it>- 4) log(<it>n </it>- 1) quartet topologies, where <it>n </it>is the number of the taxa. When the input quartet topology set contains errors, the second algorithm can reconstruct the "true" phylogeny with a probability approximately 1 - <it>p </it>in <it>O</it>(<it>n</it><sup>4 </sup>log <it>n</it>) time, where <it>p </it>is the probability for a quartet topology being an error. This probability is improved by the third algorithm to approximately <inline-formula><m:math name="1748-7188-3-1-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mfrac><m:mn>1</m:mn><m:mrow><m:mn>1</m:mn><m:mo>+</m:mo><m:msup><m:mi>q</m:mi><m:mn>2</m:mn></m:msup><m:mo>+</m:mo><m:mfrac><m:mn>1</m:mn><m:mn>2</m:mn></m:mfrac><m:msup><m:mi>q</m:mi><m:mn>4</m:mn></m:msup><m:mo>+</m:mo><m:mfrac><m:mn>1</m:mn><m:mrow><m:mn>16</m:mn></m:mrow></m:mfrac><m:msup><m:mi>q</m:mi><m:mn>5</m:mn></m:msup></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF"> MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqcfa4aaSaaaeaacqaIXaqmaeaacqaIXaqmcqGHRaWkcqWGXbqCdaahaaqabeaacqaIYaGmaaGaey4kaSYaaSaaaeaacqaIXaqmaeaacqaIYaGmaaGaemyCae3aaWbaaeqabaGaeGinaqdaaiabgUcaRmaalaaabaGaeGymaedabaGaeGymaeJaeGOnaydaaiabdghaXnaaCaaabeqaaiabiwda1aaaaaaaaa@3D5A@</m:annotation></m:semantics></m:math></inline-formula>, where <inline-formula><m:math name="1748-7188-3-1-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>q</m:mi><m:mo>=</m:mo><m:mfrac><m:mi>p</m:mi><m:mrow><m:mn>1</m:mn><m:mo>−</m:mo><m:mi>p</m:mi></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF"> MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyCaeNaeyypa0tcfa4aaSaaaeaacqWGWbaCaeaacqaIXaqmcqGHsislcqWGWbaCaaaaaa@3391@</m:annotation></m:semantics></m:math></inline-formula>, with running time of <it>O</it>(<it>n</it><sup>5</sup>), which is at least 0.984 when <it>p </it>< 0.05.</p> <p>Conclusion</p> <p>The three proposed algorithms are mathematically guaranteed to reconstruct the "true" phylogeny with a high success probability. The experimental results showed that the third algorithm produced phylogenies with a higher probability than its aforementioned theoretical lower bound and outperformed some existing phylogeny reconstruction methods in both speed and accuracy.</p> http://www.almob.org/content/3/1/1
collection DOAJ
language English
format Article
sources DOAJ
author Wu Gang
Kao Ming-Yang
Lin Guohui
You Jia-Huai
spellingShingle Wu Gang
Kao Ming-Yang
Lin Guohui
You Jia-Huai
Reconstructing phylogenies from noisy quartets in polynomial time with a high success probability
Algorithms for Molecular Biology
author_facet Wu Gang
Kao Ming-Yang
Lin Guohui
You Jia-Huai
author_sort Wu Gang
title Reconstructing phylogenies from noisy quartets in polynomial time with a high success probability
title_short Reconstructing phylogenies from noisy quartets in polynomial time with a high success probability
title_full Reconstructing phylogenies from noisy quartets in polynomial time with a high success probability
title_fullStr Reconstructing phylogenies from noisy quartets in polynomial time with a high success probability
title_full_unstemmed Reconstructing phylogenies from noisy quartets in polynomial time with a high success probability
title_sort reconstructing phylogenies from noisy quartets in polynomial time with a high success probability
publisher BMC
series Algorithms for Molecular Biology
issn 1748-7188
publishDate 2008-01-01
description <p>Abstract</p> <p>Background</p> <p>In recent years, quartet-based phylogeny reconstruction methods have received considerable attentions in the computational biology community. Traditionally, the accuracy of a phylogeny reconstruction method is measured by simulations on synthetic datasets with known "true" phylogenies, while little theoretical analysis has been done. In this paper, we present a new model-based approach to measuring the accuracy of a quartet-based phylogeny reconstruction method. Under this model, we propose three efficient algorithms to reconstruct the "true" phylogeny with a high success probability.</p> <p>Results</p> <p>The first algorithm can reconstruct the "true" phylogeny from the input quartet topology set without quartet errors in <it>O</it>(<it>n</it><sup>2</sup>) time by querying at most (<it>n </it>- 4) log(<it>n </it>- 1) quartet topologies, where <it>n </it>is the number of the taxa. When the input quartet topology set contains errors, the second algorithm can reconstruct the "true" phylogeny with a probability approximately 1 - <it>p </it>in <it>O</it>(<it>n</it><sup>4 </sup>log <it>n</it>) time, where <it>p </it>is the probability for a quartet topology being an error. This probability is improved by the third algorithm to approximately <inline-formula><m:math name="1748-7188-3-1-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mfrac><m:mn>1</m:mn><m:mrow><m:mn>1</m:mn><m:mo>+</m:mo><m:msup><m:mi>q</m:mi><m:mn>2</m:mn></m:msup><m:mo>+</m:mo><m:mfrac><m:mn>1</m:mn><m:mn>2</m:mn></m:mfrac><m:msup><m:mi>q</m:mi><m:mn>4</m:mn></m:msup><m:mo>+</m:mo><m:mfrac><m:mn>1</m:mn><m:mrow><m:mn>16</m:mn></m:mrow></m:mfrac><m:msup><m:mi>q</m:mi><m:mn>5</m:mn></m:msup></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF"> MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqcfa4aaSaaaeaacqaIXaqmaeaacqaIXaqmcqGHRaWkcqWGXbqCdaahaaqabeaacqaIYaGmaaGaey4kaSYaaSaaaeaacqaIXaqmaeaacqaIYaGmaaGaemyCae3aaWbaaeqabaGaeGinaqdaaiabgUcaRmaalaaabaGaeGymaedabaGaeGymaeJaeGOnaydaaiabdghaXnaaCaaabeqaaiabiwda1aaaaaaaaa@3D5A@</m:annotation></m:semantics></m:math></inline-formula>, where <inline-formula><m:math name="1748-7188-3-1-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>q</m:mi><m:mo>=</m:mo><m:mfrac><m:mi>p</m:mi><m:mrow><m:mn>1</m:mn><m:mo>−</m:mo><m:mi>p</m:mi></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF"> MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyCaeNaeyypa0tcfa4aaSaaaeaacqWGWbaCaeaacqaIXaqmcqGHsislcqWGWbaCaaaaaa@3391@</m:annotation></m:semantics></m:math></inline-formula>, with running time of <it>O</it>(<it>n</it><sup>5</sup>), which is at least 0.984 when <it>p </it>< 0.05.</p> <p>Conclusion</p> <p>The three proposed algorithms are mathematically guaranteed to reconstruct the "true" phylogeny with a high success probability. The experimental results showed that the third algorithm produced phylogenies with a higher probability than its aforementioned theoretical lower bound and outperformed some existing phylogeny reconstruction methods in both speed and accuracy.</p>
url http://www.almob.org/content/3/1/1
work_keys_str_mv AT wugang reconstructingphylogeniesfromnoisyquartetsinpolynomialtimewithahighsuccessprobability
AT kaomingyang reconstructingphylogeniesfromnoisyquartetsinpolynomialtimewithahighsuccessprobability
AT linguohui reconstructingphylogeniesfromnoisyquartetsinpolynomialtimewithahighsuccessprobability
AT youjiahuai reconstructingphylogeniesfromnoisyquartetsinpolynomialtimewithahighsuccessprobability
_version_ 1725501180133507072