Using Evolutionary Information and Multi-Label Linear Discriminant Analysis to Predict the Subcellular Location of Multi-Site Bacterial Proteins via Chou’s 5-Steps Rule

The function of a protein is closely tied to its subcellular location. Identifying the subcellular location of proteins is a crucial step to understand their functions. However, determining the subcellular location of proteins experimentally is time-consuming and costly. Therefore, developing effect...

Full description

Bibliographic Details
Main Authors: Lei Du, Qingfang Meng, Hui Jiang, Yang Li
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9043532/
Description
Summary:The function of a protein is closely tied to its subcellular location. Identifying the subcellular location of proteins is a crucial step to understand their functions. However, determining the subcellular location of proteins experimentally is time-consuming and costly. Therefore, developing effective computational methods to predict the subcellular positions of proteins is a hotspot in bioinformatics. Though many models have been proposed to improve the prediction accuracy of protein subcellular localization, there are still several shortcomings: (1) numerous methods ignore the multi-site proteins; (2) high dimensional features bring the burden to the construction of the prediction model. In this work, we proposed a method to predict the subcellular location of bacterial proteins with both single and multiple locations. Two features based on evolutionary information are extracted to solve the multi-site prediction problem, of which one is a 190-dimensional feature vector from absolute entropy correlation analysis (AECA-PSSM) and another is a 480-dimensional feature vector extracted using discrete wavelet transform (PSSM-DWT). After combining both proposed features, multi-label linear discriminant analysis (MLDA) is employed to transform the high-dimensional feature space into a lower-dimensional space. Multi-label k-nearest neighbors algorithm (ML-KNN) is utilized to predict the subcellular location of both single-site and multi-site proteins. Experimental results on Gram-positive dataset and Gram-negative dataset show the effectiveness of the proposed method.
ISSN:2169-3536