Summary: | 碩士 === 國立成功大學 === 醫學資訊研究所 === 96 === In the post-genome period, the protein domain structures have been published rapidly. For figuring out the cell function, the mechanism of protein-DNA interaction is an important subject in resent bioinformatics research and has not been comprehensively studied. Several machine learning based methods have been attempted to solve this issue. Until recently, few studies have been successful in translating the tertiary structure characteristics of proteins into appropriate features for utilizing the learning mechanism to predict DNA-binding Proteins. In this work, a novel machine learning approach based on using HMMs (hidden Markov Models) to express the characteristics of DNA-binding Proteins in the both aspects of amino acid sequence and tertiary structure has been presented. Moreover, several helpful features of DNA-binding Proteins have also been utilized in the proposed method, such as residue composition, structure pattern composition and accessible surface area of residues. We also develop a SVM (Support Vector Machine) based classifier to predict general DNA-binding Proteins, and obtain the accuracy of 88.45% through 5-folds cross-validation. Furthermore, a response element specific classifier is constructed for predicting response element specific DNA-binding Proteins, and is obtained the precision of 96.57% with recall rate as 88.83% in average. Finally, this high accuracy classifier is employed to predict the DNA-binding Proteins from MCF-7 which likely to bind to estrogen response elements.
|