PhySIC_IST: cleaning source trees to infer more informative supertrees

<p>Abstract</p> <p>Background</p> <p>Supertree methods combine phylogenies with overlapping sets of taxa into a larger one. Topological conflicts frequently arise among source trees for methodological or biological reasons, such as long branch attraction, lateral gene t...

Full description

Bibliographic Details
Main Authors: Douzery Emmanuel JP, Lefort Vincent, Berry Vincent, Scornavacca Celine, Ranwez Vincent
Format: Article
Language:English
Published: BMC 2008-10-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/9/413
id doaj-fa1ed70ac4e8427b822c2349ffe87847
record_format Article
spelling doaj-fa1ed70ac4e8427b822c2349ffe878472020-11-24T20:53:40ZengBMCBMC Bioinformatics1471-21052008-10-019141310.1186/1471-2105-9-413PhySIC_IST: cleaning source trees to infer more informative supertreesDouzery Emmanuel JPLefort VincentBerry VincentScornavacca CelineRanwez Vincent<p>Abstract</p> <p>Background</p> <p>Supertree methods combine phylogenies with overlapping sets of taxa into a larger one. Topological conflicts frequently arise among source trees for methodological or biological reasons, such as long branch attraction, lateral gene transfers, gene duplication/loss or deep gene coalescence. When topological conflicts occur among source trees, <it>liberal </it>methods infer supertrees containing the most frequent alternative, while <it>veto </it>methods infer supertrees not contradicting any source tree, <it>i.e</it>. discard all conflicting resolutions. When the source trees host a significant number of topological conflicts or have a small taxon overlap, supertree methods of both kinds can propose poorly resolved, hence uninformative, supertrees.</p> <p>Results</p> <p>To overcome this problem, we propose to infer non-plenary supertrees, <it>i.e</it>. supertrees that do not necessarily contain all the taxa present in the source trees, discarding those whose position greatly differs among source trees or for which insufficient information is provided. We detail a variant of the <it>PhySIC </it>veto method called <it>PhySIC_IST </it>that can infer non-plenary supertrees. <it>PhySIC_IST </it>aims at inferring supertrees that satisfy the same appealing theoretical properties as with <it>PhySIC</it>, while being as informative as possible under this constraint. The informativeness of a supertree is estimated using a variation of the CIC (Cladistic Information Content) criterion, that takes into account both the presence of multifurcations and the absence of some taxa. Additionally, we propose a statistical preprocessing step called STC (Source Trees Correction) to correct the source trees prior to the supertree inference. STC is a liberal step that removes the parts of each source tree that significantly conflict with other source trees. Combining STC with a veto method allows an explicit trade-off between veto and liberal approaches, tuned by a single parameter.</p> <p>Performing large-scale simulations, we observe that STC+<it>PhySIC_IST </it>infers much more informative supertrees than <it>PhySIC</it>, while preserving low type I error compared to the well-known MRP method. Two biological case studies on animals confirm that the STC preprocess successfully detects anomalies in the source trees while STC+<it>PhySIC_IST </it>provides well-resolved supertrees agreeing with current knowledge in systematics.</p> <p>Conclusion</p> <p>The paper introduces and tests two new methodologies, <it>PhySIC_IST </it>and STC, that demonstrate the interest in inferring non-plenary supertrees as well as preprocessing the source trees. An implementation of the methods is available at: <url>http://www.atgc-montpellier.fr/physic_ist/</url>. </p> http://www.biomedcentral.com/1471-2105/9/413
collection DOAJ
language English
format Article
sources DOAJ
author Douzery Emmanuel JP
Lefort Vincent
Berry Vincent
Scornavacca Celine
Ranwez Vincent
spellingShingle Douzery Emmanuel JP
Lefort Vincent
Berry Vincent
Scornavacca Celine
Ranwez Vincent
PhySIC_IST: cleaning source trees to infer more informative supertrees
BMC Bioinformatics
author_facet Douzery Emmanuel JP
Lefort Vincent
Berry Vincent
Scornavacca Celine
Ranwez Vincent
author_sort Douzery Emmanuel JP
title PhySIC_IST: cleaning source trees to infer more informative supertrees
title_short PhySIC_IST: cleaning source trees to infer more informative supertrees
title_full PhySIC_IST: cleaning source trees to infer more informative supertrees
title_fullStr PhySIC_IST: cleaning source trees to infer more informative supertrees
title_full_unstemmed PhySIC_IST: cleaning source trees to infer more informative supertrees
title_sort physic_ist: cleaning source trees to infer more informative supertrees
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2008-10-01
description <p>Abstract</p> <p>Background</p> <p>Supertree methods combine phylogenies with overlapping sets of taxa into a larger one. Topological conflicts frequently arise among source trees for methodological or biological reasons, such as long branch attraction, lateral gene transfers, gene duplication/loss or deep gene coalescence. When topological conflicts occur among source trees, <it>liberal </it>methods infer supertrees containing the most frequent alternative, while <it>veto </it>methods infer supertrees not contradicting any source tree, <it>i.e</it>. discard all conflicting resolutions. When the source trees host a significant number of topological conflicts or have a small taxon overlap, supertree methods of both kinds can propose poorly resolved, hence uninformative, supertrees.</p> <p>Results</p> <p>To overcome this problem, we propose to infer non-plenary supertrees, <it>i.e</it>. supertrees that do not necessarily contain all the taxa present in the source trees, discarding those whose position greatly differs among source trees or for which insufficient information is provided. We detail a variant of the <it>PhySIC </it>veto method called <it>PhySIC_IST </it>that can infer non-plenary supertrees. <it>PhySIC_IST </it>aims at inferring supertrees that satisfy the same appealing theoretical properties as with <it>PhySIC</it>, while being as informative as possible under this constraint. The informativeness of a supertree is estimated using a variation of the CIC (Cladistic Information Content) criterion, that takes into account both the presence of multifurcations and the absence of some taxa. Additionally, we propose a statistical preprocessing step called STC (Source Trees Correction) to correct the source trees prior to the supertree inference. STC is a liberal step that removes the parts of each source tree that significantly conflict with other source trees. Combining STC with a veto method allows an explicit trade-off between veto and liberal approaches, tuned by a single parameter.</p> <p>Performing large-scale simulations, we observe that STC+<it>PhySIC_IST </it>infers much more informative supertrees than <it>PhySIC</it>, while preserving low type I error compared to the well-known MRP method. Two biological case studies on animals confirm that the STC preprocess successfully detects anomalies in the source trees while STC+<it>PhySIC_IST </it>provides well-resolved supertrees agreeing with current knowledge in systematics.</p> <p>Conclusion</p> <p>The paper introduces and tests two new methodologies, <it>PhySIC_IST </it>and STC, that demonstrate the interest in inferring non-plenary supertrees as well as preprocessing the source trees. An implementation of the methods is available at: <url>http://www.atgc-montpellier.fr/physic_ist/</url>. </p>
url http://www.biomedcentral.com/1471-2105/9/413
work_keys_str_mv AT douzeryemmanueljp physicistcleaningsourcetreestoinfermoreinformativesupertrees
AT lefortvincent physicistcleaningsourcetreestoinfermoreinformativesupertrees
AT berryvincent physicistcleaningsourcetreestoinfermoreinformativesupertrees
AT scornavaccaceline physicistcleaningsourcetreestoinfermoreinformativesupertrees
AT ranwezvincent physicistcleaningsourcetreestoinfermoreinformativesupertrees
_version_ 1716796589999128576