Prediction of DNA-Binding Sites in Proteins
碩士 === 國立交通大學 === 生物資訊研究所 === 94 === In our study, we investigate the design of accurate predictors for DNA-binding sites in proteins from amino acid sequences. Two classification methods, support vector machine (SVM) and fuzzy k-nearest neighbors (fuzzy k-NN), are used to predict of DNA-binding sit...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2006
|
Online Access: | http://ndltd.ncl.edu.tw/handle/62025727549856024644 |
id |
ndltd-TW-094NCTU5112011 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-094NCTU51120112016-05-27T04:18:37Z http://ndltd.ncl.edu.tw/handle/62025727549856024644 Prediction of DNA-Binding Sites in Proteins 預測蛋白質上去氧核醣核酸鍵結位置 Fu-Chieh Yu 游富傑 碩士 國立交通大學 生物資訊研究所 94 In our study, we investigate the design of accurate predictors for DNA-binding sites in proteins from amino acid sequences. Two classification methods, support vector machine (SVM) and fuzzy k-nearest neighbors (fuzzy k-NN), are used to predict of DNA-binding sites in proteins. As a result, we propose a hybrid method that has best performance using SVM in conjunction with evolutionary information of amino acid sequences in terms of their position specific scoring matrices (PSSMs) for prediction of DNA-binding sites. Considering the numbers of binding and non-binding residues in proteins are significantly unequal, two additional weights as well as SVM parameters are analyzed and adopted to maximize net prediction (NP, an average of Sensitivity and Specificity) accuracy. To evaluate the generalization ability of the proposed method SVM-PSSM, a DNA-binding dataset PDC-59 consisting of 59 protein chains with low sequence identity on each other is additionally established. The SVM-based method using the same six-fold cross-validation procedure and PSSM features has NP=80.15% for the training dataset PDNA-62 and NP=69.54% for the independent test dataset PDC-59, which are much better than the existing neural network based method by increasing the NP values for training and test accuracies up to 13.45% and 16.53%, respectively. Besides the PSSM feature, other amino acids physico-chemical properties features which are related to protein-DNA interactions such as solvent accessible surface area, electric charge, and hydropathy index are also adopted and analyzed. Simulation results reveal that SVM-PSSM performs well in predicting DNA-binding sites of novel proteins from amino acid sequences. Shinn-Ying Ho 何信瑩 2006 學位論文 ; thesis 33 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立交通大學 === 生物資訊研究所 === 94 === In our study, we investigate the design of accurate predictors for DNA-binding sites in proteins from amino acid sequences. Two classification methods, support vector machine (SVM) and fuzzy k-nearest neighbors (fuzzy k-NN), are used to predict of DNA-binding sites in proteins. As a result, we propose a hybrid method that has best performance using SVM in conjunction with evolutionary information of amino acid sequences in terms of their position specific scoring matrices (PSSMs) for prediction of DNA-binding sites. Considering the numbers of binding and non-binding residues in proteins are significantly unequal, two additional weights as well as SVM parameters are analyzed and adopted to maximize net prediction (NP, an average of Sensitivity and Specificity) accuracy. To evaluate the generalization ability of the proposed method SVM-PSSM, a DNA-binding dataset PDC-59 consisting of 59 protein chains with low sequence identity on each other is additionally established. The SVM-based method using the same six-fold cross-validation procedure and PSSM features has NP=80.15% for the training dataset PDNA-62 and NP=69.54% for the independent test dataset PDC-59, which are much better than the existing neural network based method by increasing the NP values for training and test accuracies up to 13.45% and 16.53%, respectively. Besides the PSSM feature, other amino acids physico-chemical properties features which are related to protein-DNA interactions such as solvent accessible surface area, electric charge, and hydropathy index are also adopted and analyzed. Simulation results reveal that SVM-PSSM performs well in predicting DNA-binding sites of novel proteins from amino acid sequences.
|
author2 |
Shinn-Ying Ho |
author_facet |
Shinn-Ying Ho Fu-Chieh Yu 游富傑 |
author |
Fu-Chieh Yu 游富傑 |
spellingShingle |
Fu-Chieh Yu 游富傑 Prediction of DNA-Binding Sites in Proteins |
author_sort |
Fu-Chieh Yu |
title |
Prediction of DNA-Binding Sites in Proteins |
title_short |
Prediction of DNA-Binding Sites in Proteins |
title_full |
Prediction of DNA-Binding Sites in Proteins |
title_fullStr |
Prediction of DNA-Binding Sites in Proteins |
title_full_unstemmed |
Prediction of DNA-Binding Sites in Proteins |
title_sort |
prediction of dna-binding sites in proteins |
publishDate |
2006 |
url |
http://ndltd.ncl.edu.tw/handle/62025727549856024644 |
work_keys_str_mv |
AT fuchiehyu predictionofdnabindingsitesinproteins AT yóufùjié predictionofdnabindingsitesinproteins AT fuchiehyu yùcèdànbáizhìshàngqùyǎnghétánghésuānjiànjiéwèizhì AT yóufùjié yùcèdànbáizhìshàngqùyǎnghétánghésuānjiànjiéwèizhì |
_version_ |
1718282506536484864 |