Evolutionary triplet models of structured RNA.

The reconstruction and synthesis of ancestral RNAs is a feasible goal for paleogenetics. This will require new bioinformatics methods, including a robust statistical framework for reconstructing histories of substitutions, indels and structural changes. We describe a "transducer composition&quo...

Full description

Bibliographic Details
Main Authors: Robert K Bradley, Ian Holmes
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2009-08-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC2725318?pdf=render
id doaj-58aedd44c48648c68cd526b8a67b26b8
record_format Article
spelling doaj-58aedd44c48648c68cd526b8a67b26b82020-11-25T01:38:40ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582009-08-0158e100048310.1371/journal.pcbi.1000483Evolutionary triplet models of structured RNA.Robert K BradleyIan HolmesThe reconstruction and synthesis of ancestral RNAs is a feasible goal for paleogenetics. This will require new bioinformatics methods, including a robust statistical framework for reconstructing histories of substitutions, indels and structural changes. We describe a "transducer composition" algorithm for extending pairwise probabilistic models of RNA structural evolution to models of multiple sequences related by a phylogenetic tree. This algorithm draws on formal models of computational linguistics as well as the 1985 protosequence algorithm of David Sankoff. The output of the composition algorithm is a multiple-sequence stochastic context-free grammar. We describe dynamic programming algorithms, which are robust to null cycles and empty bifurcations, for parsing this grammar. Example applications include structural alignment of non-coding RNAs, propagation of structural information from an experimentally-characterized sequence to its homologs, and inference of the ancestral structure of a set of diverged RNAs. We implemented the above algorithms for a simple model of pairwise RNA structural evolution; in particular, the algorithms for maximum likelihood (ML) alignment of three known RNA structures and a known phylogeny and inference of the common ancestral structure. We compared this ML algorithm to a variety of related, but simpler, techniques, including ML alignment algorithms for simpler models that omitted various aspects of the full model and also a posterior-decoding alignment algorithm for one of the simpler models. In our tests, incorporation of basepair structure was the most important factor for accurate alignment inference; appropriate use of posterior-decoding was next; and fine details of the model were least important. Posterior-decoding heuristics can be substantially faster than exact phylogenetic inference, so this motivates the use of sum-over-pairs heuristics where possible (and approximate sum-over-pairs). For more exact probabilistic inference, we discuss the use of transducer composition for ML (or MCMC) inference on phylogenies, including possible ways to make the core operations tractable.http://europepmc.org/articles/PMC2725318?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Robert K Bradley
Ian Holmes
spellingShingle Robert K Bradley
Ian Holmes
Evolutionary triplet models of structured RNA.
PLoS Computational Biology
author_facet Robert K Bradley
Ian Holmes
author_sort Robert K Bradley
title Evolutionary triplet models of structured RNA.
title_short Evolutionary triplet models of structured RNA.
title_full Evolutionary triplet models of structured RNA.
title_fullStr Evolutionary triplet models of structured RNA.
title_full_unstemmed Evolutionary triplet models of structured RNA.
title_sort evolutionary triplet models of structured rna.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2009-08-01
description The reconstruction and synthesis of ancestral RNAs is a feasible goal for paleogenetics. This will require new bioinformatics methods, including a robust statistical framework for reconstructing histories of substitutions, indels and structural changes. We describe a "transducer composition" algorithm for extending pairwise probabilistic models of RNA structural evolution to models of multiple sequences related by a phylogenetic tree. This algorithm draws on formal models of computational linguistics as well as the 1985 protosequence algorithm of David Sankoff. The output of the composition algorithm is a multiple-sequence stochastic context-free grammar. We describe dynamic programming algorithms, which are robust to null cycles and empty bifurcations, for parsing this grammar. Example applications include structural alignment of non-coding RNAs, propagation of structural information from an experimentally-characterized sequence to its homologs, and inference of the ancestral structure of a set of diverged RNAs. We implemented the above algorithms for a simple model of pairwise RNA structural evolution; in particular, the algorithms for maximum likelihood (ML) alignment of three known RNA structures and a known phylogeny and inference of the common ancestral structure. We compared this ML algorithm to a variety of related, but simpler, techniques, including ML alignment algorithms for simpler models that omitted various aspects of the full model and also a posterior-decoding alignment algorithm for one of the simpler models. In our tests, incorporation of basepair structure was the most important factor for accurate alignment inference; appropriate use of posterior-decoding was next; and fine details of the model were least important. Posterior-decoding heuristics can be substantially faster than exact phylogenetic inference, so this motivates the use of sum-over-pairs heuristics where possible (and approximate sum-over-pairs). For more exact probabilistic inference, we discuss the use of transducer composition for ML (or MCMC) inference on phylogenies, including possible ways to make the core operations tractable.
url http://europepmc.org/articles/PMC2725318?pdf=render
work_keys_str_mv AT robertkbradley evolutionarytripletmodelsofstructuredrna
AT ianholmes evolutionarytripletmodelsofstructuredrna
_version_ 1725052311508615168