Hidden Markov Model Based DNA-binding Proteins Prediction by Mining on Sequence and Structure Information

碩士 === 國立成功大學 === 醫學資訊研究所 === 96 === In the post-genome period, the protein domain structures have been published rapidly. For figuring out the cell function, the mechanism of protein-DNA interaction is an important subject in resent bioinformatics research and has not been comprehensively studied....

Full description

Bibliographic Details
Main Authors: Wei-Jhih Chen, 陳韋志
Other Authors: Hung-Yu Kao
Format: Others
Language:en_US
Published: 2008
Online Access:http://ndltd.ncl.edu.tw/handle/20716134364431964671
Description
Summary:碩士 === 國立成功大學 === 醫學資訊研究所 === 96 === In the post-genome period, the protein domain structures have been published rapidly. For figuring out the cell function, the mechanism of protein-DNA interaction is an important subject in resent bioinformatics research and has not been comprehensively studied. Several machine learning based methods have been attempted to solve this issue. Until recently, few studies have been successful in translating the tertiary structure characteristics of proteins into appropriate features for utilizing the learning mechanism to predict DNA-binding Proteins. In this work, a novel machine learning approach based on using HMMs (hidden Markov Models) to express the characteristics of DNA-binding Proteins in the both aspects of amino acid sequence and tertiary structure has been presented. Moreover, several helpful features of DNA-binding Proteins have also been utilized in the proposed method, such as residue composition, structure pattern composition and accessible surface area of residues. We also develop a SVM (Support Vector Machine) based classifier to predict general DNA-binding Proteins, and obtain the accuracy of 88.45% through 5-folds cross-validation. Furthermore, a response element specific classifier is constructed for predicting response element specific DNA-binding Proteins, and is obtained the precision of 96.57% with recall rate as 88.83% in average. Finally, this high accuracy classifier is employed to predict the DNA-binding Proteins from MCF-7 which likely to bind to estrogen response elements.