PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction

Interactions between proteins and DNAs play essential roles in many biological processes. DNA binding proteins can be classified into two categories. Double-stranded DNA-binding proteins (DSBs) bind to double-stranded DNA and are involved in a series of cell functions such as gene expression and reg...

Full description

Bibliographic Details
Main Authors: Changgeng Tan, Tong Wang, Wenyi Yang, Lei Deng
Format: Article
Language:English
Published: MDPI AG 2019-12-01
Series:Molecules
Subjects:
Online Access:https://www.mdpi.com/1420-3049/25/1/98
id doaj-d6fff9eff6544fb194fd44385007ff64
record_format Article
spelling doaj-d6fff9eff6544fb194fd44385007ff642020-11-25T01:40:14ZengMDPI AGMolecules1420-30492019-12-012519810.3390/molecules25010098molecules25010098PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein PredictionChanggeng Tan0Tong Wang1Wenyi Yang2Lei Deng3School of Computer Science and Engineering, Central South University, Changsha 410075, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410075, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410075, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410075, ChinaInteractions between proteins and DNAs play essential roles in many biological processes. DNA binding proteins can be classified into two categories. Double-stranded DNA-binding proteins (DSBs) bind to double-stranded DNA and are involved in a series of cell functions such as gene expression and regulation. Single-stranded DNA-binding proteins (SSBs) are necessary for DNA replication, recombination, and repair and are responsible for binding to the single-stranded DNA. Therefore, the effective classification of DNA-binding proteins is helpful for functional annotations of proteins. In this work, we propose PredPSD, a computational method based on sequence information that accurately predicts SSBs and DSBs. It introduces three novel feature extraction algorithms. In particular, we use the autocross-covariance (ACC) transformation to transform feature matrices into fixed-length vectors. Then, we put the optimal feature subset obtained by the minimal-redundancy-maximal-relevance criterion (mRMR) feature selection algorithm into the gradient tree boosting (GTB). In 10-fold cross-validation based on a benchmark dataset, PredPSD achieves promising performances with an AUC score of 0.956 and an accuracy of 0.912, which are better than those of existing methods. Moreover, our method has significantly improved the prediction accuracy in independent testing. The experimental results show that PredPSD can significantly recognize the binding specificity and differentiate DSBs and SSBs.https://www.mdpi.com/1420-3049/25/1/98ssbs (single-stranded dna-binding proteins)dsb (double-stranded dna-binding proteins)protein sequencegradient tree boostingbinding specificity
collection DOAJ
language English
format Article
sources DOAJ
author Changgeng Tan
Tong Wang
Wenyi Yang
Lei Deng
spellingShingle Changgeng Tan
Tong Wang
Wenyi Yang
Lei Deng
PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction
Molecules
ssbs (single-stranded dna-binding proteins)
dsb (double-stranded dna-binding proteins)
protein sequence
gradient tree boosting
binding specificity
author_facet Changgeng Tan
Tong Wang
Wenyi Yang
Lei Deng
author_sort Changgeng Tan
title PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction
title_short PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction
title_full PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction
title_fullStr PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction
title_full_unstemmed PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction
title_sort predpsd: a gradient tree boosting approach for single-stranded and double-stranded dna binding protein prediction
publisher MDPI AG
series Molecules
issn 1420-3049
publishDate 2019-12-01
description Interactions between proteins and DNAs play essential roles in many biological processes. DNA binding proteins can be classified into two categories. Double-stranded DNA-binding proteins (DSBs) bind to double-stranded DNA and are involved in a series of cell functions such as gene expression and regulation. Single-stranded DNA-binding proteins (SSBs) are necessary for DNA replication, recombination, and repair and are responsible for binding to the single-stranded DNA. Therefore, the effective classification of DNA-binding proteins is helpful for functional annotations of proteins. In this work, we propose PredPSD, a computational method based on sequence information that accurately predicts SSBs and DSBs. It introduces three novel feature extraction algorithms. In particular, we use the autocross-covariance (ACC) transformation to transform feature matrices into fixed-length vectors. Then, we put the optimal feature subset obtained by the minimal-redundancy-maximal-relevance criterion (mRMR) feature selection algorithm into the gradient tree boosting (GTB). In 10-fold cross-validation based on a benchmark dataset, PredPSD achieves promising performances with an AUC score of 0.956 and an accuracy of 0.912, which are better than those of existing methods. Moreover, our method has significantly improved the prediction accuracy in independent testing. The experimental results show that PredPSD can significantly recognize the binding specificity and differentiate DSBs and SSBs.
topic ssbs (single-stranded dna-binding proteins)
dsb (double-stranded dna-binding proteins)
protein sequence
gradient tree boosting
binding specificity
url https://www.mdpi.com/1420-3049/25/1/98
work_keys_str_mv AT changgengtan predpsdagradienttreeboostingapproachforsinglestrandedanddoublestrandeddnabindingproteinprediction
AT tongwang predpsdagradienttreeboostingapproachforsinglestrandedanddoublestrandeddnabindingproteinprediction
AT wenyiyang predpsdagradienttreeboostingapproachforsinglestrandedanddoublestrandeddnabindingproteinprediction
AT leideng predpsdagradienttreeboostingapproachforsinglestrandedanddoublestrandeddnabindingproteinprediction
_version_ 1725046266428129280