PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation

DNA-binding proteins play crucial roles in various biological processes, such as DNA replication and repair, transcriptional regulation and many other biological activities associated with DNA. Experimental recognition techniques for DNA-binding proteins identification are both time consuming and ex...

Full description

Bibliographic Details
Main Authors: Jun Zhang, Bin Liu
Format: Article
Language:English
Published: MDPI AG 2017-08-01
Series:International Journal of Molecular Sciences
Subjects:
Online Access:https://www.mdpi.com/1422-0067/18/9/1856
id doaj-6d327a0f727b4b51b1fca6e9d14a6031
record_format Article
spelling doaj-6d327a0f727b4b51b1fca6e9d14a60312020-11-24T20:48:25ZengMDPI AGInternational Journal of Molecular Sciences1422-00672017-08-01189185610.3390/ijms18091856ijms18091856PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram TransformationJun Zhang0Bin Liu1School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, ChinaSchool of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, ChinaDNA-binding proteins play crucial roles in various biological processes, such as DNA replication and repair, transcriptional regulation and many other biological activities associated with DNA. Experimental recognition techniques for DNA-binding proteins identification are both time consuming and expensive. Effective methods for identifying these proteins only based on protein sequences are highly required. The key for sequence-based methods is to effectively represent protein sequences. It has been reported by various previous studies that evolutionary information is crucial for DNA-binding protein identification. In this study, we employed four methods to extract the evolutionary information from Position Specific Frequency Matrix (PSFM), including Residue Probing Transformation (RPT), Evolutionary Difference Transformation (EDT), Distance-Bigram Transformation (DBT), and Trigram Transformation (TT). The PSFMs were converted into fixed length feature vectors by these four methods, and then respectively combined with Support Vector Machines (SVMs); four predictors for identifying these proteins were constructed, including PSFM-RPT, PSFM-EDT, PSFM-DBT, and PSFM-TT. Experimental results on a widely used benchmark dataset PDB1075 and an independent dataset PDB186 showed that these four methods achieved state-of-the-art-performance, and PSFM-DBT outperformed other existing methods in this field. For practical applications, a user-friendly webserver of PSFM-DBT was established, which is available at http://bioinformatics.hitsz.edu.cn/PSFM-DBT/.https://www.mdpi.com/1422-0067/18/9/1856PSFM-DBTDNA binding proteindistance bigram transformationPSFM
collection DOAJ
language English
format Article
sources DOAJ
author Jun Zhang
Bin Liu
spellingShingle Jun Zhang
Bin Liu
PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation
International Journal of Molecular Sciences
PSFM-DBT
DNA binding protein
distance bigram transformation
PSFM
author_facet Jun Zhang
Bin Liu
author_sort Jun Zhang
title PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation
title_short PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation
title_full PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation
title_fullStr PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation
title_full_unstemmed PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation
title_sort psfm-dbt: identifying dna-binding proteins by combing position specific frequency matrix and distance-bigram transformation
publisher MDPI AG
series International Journal of Molecular Sciences
issn 1422-0067
publishDate 2017-08-01
description DNA-binding proteins play crucial roles in various biological processes, such as DNA replication and repair, transcriptional regulation and many other biological activities associated with DNA. Experimental recognition techniques for DNA-binding proteins identification are both time consuming and expensive. Effective methods for identifying these proteins only based on protein sequences are highly required. The key for sequence-based methods is to effectively represent protein sequences. It has been reported by various previous studies that evolutionary information is crucial for DNA-binding protein identification. In this study, we employed four methods to extract the evolutionary information from Position Specific Frequency Matrix (PSFM), including Residue Probing Transformation (RPT), Evolutionary Difference Transformation (EDT), Distance-Bigram Transformation (DBT), and Trigram Transformation (TT). The PSFMs were converted into fixed length feature vectors by these four methods, and then respectively combined with Support Vector Machines (SVMs); four predictors for identifying these proteins were constructed, including PSFM-RPT, PSFM-EDT, PSFM-DBT, and PSFM-TT. Experimental results on a widely used benchmark dataset PDB1075 and an independent dataset PDB186 showed that these four methods achieved state-of-the-art-performance, and PSFM-DBT outperformed other existing methods in this field. For practical applications, a user-friendly webserver of PSFM-DBT was established, which is available at http://bioinformatics.hitsz.edu.cn/PSFM-DBT/.
topic PSFM-DBT
DNA binding protein
distance bigram transformation
PSFM
url https://www.mdpi.com/1422-0067/18/9/1856
work_keys_str_mv AT junzhang psfmdbtidentifyingdnabindingproteinsbycombingpositionspecificfrequencymatrixanddistancebigramtransformation
AT binliu psfmdbtidentifyingdnabindingproteinsbycombingpositionspecificfrequencymatrixanddistancebigramtransformation
_version_ 1716807797126987776