Modeling Transcription Start Site and Promoter Elements with Dependency Graphs and Their Expanded Bayesian Networks

碩士 === 國立清華大學 === 電機工程學系 === 92 === We have a large amount of raw genomic DNA sequence data now with the completion of the Human Genome Project (HGP). There are hundreds of programs developed to analyze these DNA sequences. Promoter is a region usually located at the 5' flanking end of a gene a...

Full description

Bibliographic Details
Main Authors: Chen-Wei Hsu, 許承偉
Other Authors: Chung-Chin Lu
Format: Others
Language:en_US
Published: 2004
Online Access:http://ndltd.ncl.edu.tw/handle/85582533837369657887
Description
Summary:碩士 === 國立清華大學 === 電機工程學系 === 92 === We have a large amount of raw genomic DNA sequence data now with the completion of the Human Genome Project (HGP). There are hundreds of programs developed to analyze these DNA sequences. Promoter is a region usually located at the 5' flanking end of a gene and encompasses the transcription start site. The promoter plays an important role in gene regulation and the detection of the promoter region could help to improve the accuracy of gene-finding. There are also several in silico approaches to predict promoter region or transcription start site, but the performance of these programs are usually unsatisfactory since the number of false positives is too high. In this thesis, we first develop a dependency graph as the basic model for the transcription start site by chi-square test and then expand this graph with a Bayesian network by allowing nucleotides in each position to appear more than once to catch their inter-dependency but avoid overfitting. In consideration of more than one signals within the promoter region, we also construct dependency graph and it's expanded Bayesian network to model TATA box. The prediction of TATA box will be integrated into the prediction of transcription start site in this thesis. The results show that our method has the best performance comparing with four most famous programs available on the Internet.