Summary: | 碩士 === 國立清華大學 === 電機工程學系 === 90 === Owing to the progress of biochemical technologies, and the completion of the human genome project(HGP), a large amount of DNA or protein sequences have produced. In Bioinformatics, an important issue is to find the precise exon-intron boundaries of genes in human genomic DNA, usually called gene identification. There are many signals spreading in a gene. In this thesis we focus on the most important signals called splice sites.
A recent method used in the detection of splice signals is to model the signals by Bayesian networks. A Bayesian network can be described as a directed acyclic graph in which each node
represents a random variable. The edges express the direct
influences from parent nodes to child nodes. However, cyclic
dependency among positions cannot be described in such a
Bayesian network. This limits the capability of Bayesian network for the modeling of splice signals.
In this thesis, we first develop a dependency graph as the basic model of splice signals and then expand the graph by a Bayesian network by allowing the positions to appear more than once to capture their inter-dependencies but avoid overfitting.
The construction of the dependency graph is based on chi-square statistics to test the hypothesis of inter-dependency between positions. This method improves the performance of splice sites prediction and the gene identification system.
|