Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines

<p>Abstract</p> <p>Background</p> <p>Predicting the subcellular localization of proteins is important for determining the function of proteins. Previous works focused on predicting protein localization in Gram-negative bacteria obtained good results. However, these meth...

Full description

Bibliographic Details
Main Authors: Krishnan Arun, Sung Wing-Kin, Wang Jiren, Li Kuo-Bin
Format: Article
Language:English
Published: BMC 2005-07-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/6/174
id doaj-d241878bc7d44e3a98dfe21ba8844084
record_format Article
spelling doaj-d241878bc7d44e3a98dfe21ba88440842020-11-25T01:17:54ZengBMCBMC Bioinformatics1471-21052005-07-016117410.1186/1471-2105-6-174Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machinesKrishnan ArunSung Wing-KinWang JirenLi Kuo-Bin<p>Abstract</p> <p>Background</p> <p>Predicting the subcellular localization of proteins is important for determining the function of proteins. Previous works focused on predicting protein localization in Gram-negative bacteria obtained good results. However, these methods had relatively low accuracies for the localization of extracellular proteins. This paper studies ways to improve the accuracy for predicting extracellular localization in Gram-negative bacteria.</p> <p>Results</p> <p>We have developed a system for predicting the subcellular localization of proteins for Gram-negative bacteria based on amino acid subalphabets and a combination of multiple support vector machines. The recall of the extracellular site and overall recall of our predictor reach 86.0% and 89.8%, respectively, in 5-fold cross-validation. To the best of our knowledge, these are the most accurate results for predicting subcellular localization in Gram-negative bacteria.</p> <p>Conclusion</p> <p>Clustering 20 amino acids into a few groups by the proposed greedy algorithm provides a new way to extract features from protein sequences to cover more adjacent amino acids and hence reduce the dimensionality of the input vector of protein features. It was observed that a good amino acid grouping leads to an increase in prediction performance. Furthermore, a proper choice of a subset of complementary support vector machines constructed by different features of proteins maximizes the prediction accuracy.</p> http://www.biomedcentral.com/1471-2105/6/174
collection DOAJ
language English
format Article
sources DOAJ
author Krishnan Arun
Sung Wing-Kin
Wang Jiren
Li Kuo-Bin
spellingShingle Krishnan Arun
Sung Wing-Kin
Wang Jiren
Li Kuo-Bin
Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines
BMC Bioinformatics
author_facet Krishnan Arun
Sung Wing-Kin
Wang Jiren
Li Kuo-Bin
author_sort Krishnan Arun
title Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines
title_short Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines
title_full Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines
title_fullStr Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines
title_full_unstemmed Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines
title_sort protein subcellular localization prediction for gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2005-07-01
description <p>Abstract</p> <p>Background</p> <p>Predicting the subcellular localization of proteins is important for determining the function of proteins. Previous works focused on predicting protein localization in Gram-negative bacteria obtained good results. However, these methods had relatively low accuracies for the localization of extracellular proteins. This paper studies ways to improve the accuracy for predicting extracellular localization in Gram-negative bacteria.</p> <p>Results</p> <p>We have developed a system for predicting the subcellular localization of proteins for Gram-negative bacteria based on amino acid subalphabets and a combination of multiple support vector machines. The recall of the extracellular site and overall recall of our predictor reach 86.0% and 89.8%, respectively, in 5-fold cross-validation. To the best of our knowledge, these are the most accurate results for predicting subcellular localization in Gram-negative bacteria.</p> <p>Conclusion</p> <p>Clustering 20 amino acids into a few groups by the proposed greedy algorithm provides a new way to extract features from protein sequences to cover more adjacent amino acids and hence reduce the dimensionality of the input vector of protein features. It was observed that a good amino acid grouping leads to an increase in prediction performance. Furthermore, a proper choice of a subset of complementary support vector machines constructed by different features of proteins maximizes the prediction accuracy.</p>
url http://www.biomedcentral.com/1471-2105/6/174
work_keys_str_mv AT krishnanarun proteinsubcellularlocalizationpredictionforgramnegativebacteriausingaminoacidsubalphabetsandacombinationofmultiplesupportvectormachines
AT sungwingkin proteinsubcellularlocalizationpredictionforgramnegativebacteriausingaminoacidsubalphabetsandacombinationofmultiplesupportvectormachines
AT wangjiren proteinsubcellularlocalizationpredictionforgramnegativebacteriausingaminoacidsubalphabetsandacombinationofmultiplesupportvectormachines
AT likuobin proteinsubcellularlocalizationpredictionforgramnegativebacteriausingaminoacidsubalphabetsandacombinationofmultiplesupportvectormachines
_version_ 1725145086390435840