Modeling Splice Sites with Dependency Graphs and Their Approximation by Bayesian Networks

碩士 === 國立清華大學 === 電機工程學系 === 90 === Owing to the progress of biochemical technologies, and the completion of the human genome project(HGP), a large amount of DNA or protein sequences have produced. In Bioinformatics, an important issue is to find the precise exon-intron boundaries of gene...

Full description

Bibliographic Details
Main Authors: Te-Ming Chen, 陳德銘
Other Authors: Chung-Chin Lu
Format: Others
Language:en_US
Published: 2002
Online Access:http://ndltd.ncl.edu.tw/handle/39618993442260969276
Description
Summary:碩士 === 國立清華大學 === 電機工程學系 === 90 === Owing to the progress of biochemical technologies, and the completion of the human genome project(HGP), a large amount of DNA or protein sequences have produced. In Bioinformatics, an important issue is to find the precise exon-intron boundaries of genes in human genomic DNA, usually called gene identification. There are many signals spreading in a gene. In this thesis we focus on the most important signals called splice sites. A recent method used in the detection of splice signals is to model the signals by Bayesian networks. A Bayesian network can be described as a directed acyclic graph in which each node represents a random variable. The edges express the direct influences from parent nodes to child nodes. However, cyclic dependency among positions cannot be described in such a Bayesian network. This limits the capability of Bayesian network for the modeling of splice signals. In this thesis, we first develop a dependency graph as the basic model of splice signals and then expand the graph by a Bayesian network by allowing the positions to appear more than once to capture their inter-dependencies but avoid overfitting. The construction of the dependency graph is based on chi-square statistics to test the hypothesis of inter-dependency between positions. This method improves the performance of splice sites prediction and the gene identification system.