Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended Applications

Ribonucleic acid (RNA) secondary structures and branching properties are important for determining functional ramifications in biology. While energy minimization of the Nearest Neighbor Thermodynamic Model (NNTM) is commonly used to identify such properties (number of hairpins, maximum ladder distan...

Full description

Bibliographic Details
Main Authors: Anna Kirkpatrick, Kalen Patton, Prasad Tetali, Cassie Mitchell
Format: Article
Language:English
Published: MDPI AG 2020-10-01
Series:Mathematical and Computational Applications
Subjects:
Online Access:https://www.mdpi.com/2297-8747/25/4/67
id doaj-79a62d9d9315400a871f5aaa06d7c347
record_format Article
spelling doaj-79a62d9d9315400a871f5aaa06d7c3472020-11-25T03:46:43ZengMDPI AGMathematical and Computational Applications1300-686X2297-87472020-10-0125676710.3390/mca25040067Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended ApplicationsAnna Kirkpatrick0Kalen Patton1Prasad Tetali2Cassie Mitchell3School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332, USASchool of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332, USASchool of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332, USADepartment of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USARibonucleic acid (RNA) secondary structures and branching properties are important for determining functional ramifications in biology. While energy minimization of the Nearest Neighbor Thermodynamic Model (NNTM) is commonly used to identify such properties (number of hairpins, maximum ladder distance, etc.), it is difficult to know whether the resultant values fall within expected dispersion thresholds for a given energy function. The goal of this study was to construct a Markov chain capable of examining the dispersion of RNA secondary structures and branching properties obtained from NNTM energy function minimization independent of a specific nucleotide sequence. Plane trees are studied as a model for RNA secondary structure, with energy assigned to each tree based on the NNTM, and a corresponding Gibbs distribution is defined on the trees. Through a bijection between plane trees and 2-Motzkin paths, a Markov chain converging to the Gibbs distribution is constructed, and fast mixing time is established by estimating the spectral gap of the chain. The spectral gap estimate is obtained through a series of decompositions of the chain and also by building on known mixing time results for other chains on Dyck paths. The resulting algorithm can be used as a tool for exploring the branching structure of RNA, especially for long sequences, and to examine branching structure dependence on energy model parameters. Full exposition is provided for the mathematical techniques used with the expectation that these techniques will prove useful in bioinformatics, computational biology, and additional extended applications.https://www.mdpi.com/2297-8747/25/4/67Markov chain Monte CarloRNA secondary structurenearest neighbor thermodynamic ModelMarkov chain convergence
collection DOAJ
language English
format Article
sources DOAJ
author Anna Kirkpatrick
Kalen Patton
Prasad Tetali
Cassie Mitchell
spellingShingle Anna Kirkpatrick
Kalen Patton
Prasad Tetali
Cassie Mitchell
Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended Applications
Mathematical and Computational Applications
Markov chain Monte Carlo
RNA secondary structure
nearest neighbor thermodynamic Model
Markov chain convergence
author_facet Anna Kirkpatrick
Kalen Patton
Prasad Tetali
Cassie Mitchell
author_sort Anna Kirkpatrick
title Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended Applications
title_short Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended Applications
title_full Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended Applications
title_fullStr Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended Applications
title_full_unstemmed Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended Applications
title_sort markov chain-based sampling for exploring rna secondary structure under the nearest neighbor thermodynamic model and extended applications
publisher MDPI AG
series Mathematical and Computational Applications
issn 1300-686X
2297-8747
publishDate 2020-10-01
description Ribonucleic acid (RNA) secondary structures and branching properties are important for determining functional ramifications in biology. While energy minimization of the Nearest Neighbor Thermodynamic Model (NNTM) is commonly used to identify such properties (number of hairpins, maximum ladder distance, etc.), it is difficult to know whether the resultant values fall within expected dispersion thresholds for a given energy function. The goal of this study was to construct a Markov chain capable of examining the dispersion of RNA secondary structures and branching properties obtained from NNTM energy function minimization independent of a specific nucleotide sequence. Plane trees are studied as a model for RNA secondary structure, with energy assigned to each tree based on the NNTM, and a corresponding Gibbs distribution is defined on the trees. Through a bijection between plane trees and 2-Motzkin paths, a Markov chain converging to the Gibbs distribution is constructed, and fast mixing time is established by estimating the spectral gap of the chain. The spectral gap estimate is obtained through a series of decompositions of the chain and also by building on known mixing time results for other chains on Dyck paths. The resulting algorithm can be used as a tool for exploring the branching structure of RNA, especially for long sequences, and to examine branching structure dependence on energy model parameters. Full exposition is provided for the mathematical techniques used with the expectation that these techniques will prove useful in bioinformatics, computational biology, and additional extended applications.
topic Markov chain Monte Carlo
RNA secondary structure
nearest neighbor thermodynamic Model
Markov chain convergence
url https://www.mdpi.com/2297-8747/25/4/67
work_keys_str_mv AT annakirkpatrick markovchainbasedsamplingforexploringrnasecondarystructureunderthenearestneighborthermodynamicmodelandextendedapplications
AT kalenpatton markovchainbasedsamplingforexploringrnasecondarystructureunderthenearestneighborthermodynamicmodelandextendedapplications
AT prasadtetali markovchainbasedsamplingforexploringrnasecondarystructureunderthenearestneighborthermodynamicmodelandextendedapplications
AT cassiemitchell markovchainbasedsamplingforexploringrnasecondarystructureunderthenearestneighborthermodynamicmodelandextendedapplications
_version_ 1724504672026230784