Summary: | SNARE proteins, known as membrane fusion proteins, play a primary role to mediate vesicle fusion. Loss of function of the SNARE protein can lead to a variety of diseases. A method to accurately identify the SNARE protein is important and necessary. In this paper, we try different kinds of combinations of sampling methods (the resampling, SMOTE and no sampling), feature extraction approaches (the 188D, K-skip-2-gram and CKSAAP) and distance measurements (Chebyshev distance, Euclidean distance, Manhattan distance and Minkowski distance) to find a suitable model for identifying the SNARE proteins. By doing extensive experiments, we construct a Manhattan distance based KNN model by combining the CKSAAP feature extraction approach with no sampling method, which achieves the best identification performance among all combinations. Finally, we compare our KNN based model with a deep learning based model (called SNARE-CNN) from SN, SP, ACC and MCC four aspects, the experimental results show that the performance of our model is better than that of the SNARE-CNN.
|