PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction
Interactions between proteins and DNAs play essential roles in many biological processes. DNA binding proteins can be classified into two categories. Double-stranded DNA-binding proteins (DSBs) bind to double-stranded DNA and are involved in a series of cell functions such as gene expression and reg...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2019-12-01
|
Series: | Molecules |
Subjects: | |
Online Access: | https://www.mdpi.com/1420-3049/25/1/98 |
id |
doaj-d6fff9eff6544fb194fd44385007ff64 |
---|---|
record_format |
Article |
spelling |
doaj-d6fff9eff6544fb194fd44385007ff642020-11-25T01:40:14ZengMDPI AGMolecules1420-30492019-12-012519810.3390/molecules25010098molecules25010098PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein PredictionChanggeng Tan0Tong Wang1Wenyi Yang2Lei Deng3School of Computer Science and Engineering, Central South University, Changsha 410075, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410075, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410075, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410075, ChinaInteractions between proteins and DNAs play essential roles in many biological processes. DNA binding proteins can be classified into two categories. Double-stranded DNA-binding proteins (DSBs) bind to double-stranded DNA and are involved in a series of cell functions such as gene expression and regulation. Single-stranded DNA-binding proteins (SSBs) are necessary for DNA replication, recombination, and repair and are responsible for binding to the single-stranded DNA. Therefore, the effective classification of DNA-binding proteins is helpful for functional annotations of proteins. In this work, we propose PredPSD, a computational method based on sequence information that accurately predicts SSBs and DSBs. It introduces three novel feature extraction algorithms. In particular, we use the autocross-covariance (ACC) transformation to transform feature matrices into fixed-length vectors. Then, we put the optimal feature subset obtained by the minimal-redundancy-maximal-relevance criterion (mRMR) feature selection algorithm into the gradient tree boosting (GTB). In 10-fold cross-validation based on a benchmark dataset, PredPSD achieves promising performances with an AUC score of 0.956 and an accuracy of 0.912, which are better than those of existing methods. Moreover, our method has significantly improved the prediction accuracy in independent testing. The experimental results show that PredPSD can significantly recognize the binding specificity and differentiate DSBs and SSBs.https://www.mdpi.com/1420-3049/25/1/98ssbs (single-stranded dna-binding proteins)dsb (double-stranded dna-binding proteins)protein sequencegradient tree boostingbinding specificity |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Changgeng Tan Tong Wang Wenyi Yang Lei Deng |
spellingShingle |
Changgeng Tan Tong Wang Wenyi Yang Lei Deng PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction Molecules ssbs (single-stranded dna-binding proteins) dsb (double-stranded dna-binding proteins) protein sequence gradient tree boosting binding specificity |
author_facet |
Changgeng Tan Tong Wang Wenyi Yang Lei Deng |
author_sort |
Changgeng Tan |
title |
PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction |
title_short |
PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction |
title_full |
PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction |
title_fullStr |
PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction |
title_full_unstemmed |
PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction |
title_sort |
predpsd: a gradient tree boosting approach for single-stranded and double-stranded dna binding protein prediction |
publisher |
MDPI AG |
series |
Molecules |
issn |
1420-3049 |
publishDate |
2019-12-01 |
description |
Interactions between proteins and DNAs play essential roles in many biological processes. DNA binding proteins can be classified into two categories. Double-stranded DNA-binding proteins (DSBs) bind to double-stranded DNA and are involved in a series of cell functions such as gene expression and regulation. Single-stranded DNA-binding proteins (SSBs) are necessary for DNA replication, recombination, and repair and are responsible for binding to the single-stranded DNA. Therefore, the effective classification of DNA-binding proteins is helpful for functional annotations of proteins. In this work, we propose PredPSD, a computational method based on sequence information that accurately predicts SSBs and DSBs. It introduces three novel feature extraction algorithms. In particular, we use the autocross-covariance (ACC) transformation to transform feature matrices into fixed-length vectors. Then, we put the optimal feature subset obtained by the minimal-redundancy-maximal-relevance criterion (mRMR) feature selection algorithm into the gradient tree boosting (GTB). In 10-fold cross-validation based on a benchmark dataset, PredPSD achieves promising performances with an AUC score of 0.956 and an accuracy of 0.912, which are better than those of existing methods. Moreover, our method has significantly improved the prediction accuracy in independent testing. The experimental results show that PredPSD can significantly recognize the binding specificity and differentiate DSBs and SSBs. |
topic |
ssbs (single-stranded dna-binding proteins) dsb (double-stranded dna-binding proteins) protein sequence gradient tree boosting binding specificity |
url |
https://www.mdpi.com/1420-3049/25/1/98 |
work_keys_str_mv |
AT changgengtan predpsdagradienttreeboostingapproachforsinglestrandedanddoublestrandeddnabindingproteinprediction AT tongwang predpsdagradienttreeboostingapproachforsinglestrandedanddoublestrandeddnabindingproteinprediction AT wenyiyang predpsdagradienttreeboostingapproachforsinglestrandedanddoublestrandeddnabindingproteinprediction AT leideng predpsdagradienttreeboostingapproachforsinglestrandedanddoublestrandeddnabindingproteinprediction |
_version_ |
1725046266428129280 |