Protein Complexes Classification Study Based on Machine Learning Approaches

碩士 === 國立虎尾科技大學 === 資訊工程研究所 === 99 === Abstract Protein complexes play important roles in many cellular processes. There are several approaches have been developed for protein complexes prediction; such as (1) using graph theory to study dense protein-protein interaction regions, (2) based on experi...

Full description

Bibliographic Details
Main Authors: Kun-Ting Chao, 趙?廷
Other Authors: chien-hung Huang
Format: Others
Language:zh-TW
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/8p6y29
id ndltd-TW-099NYPI5392008
record_format oai_dc
spelling ndltd-TW-099NYPI53920082019-09-22T03:40:59Z http://ndltd.ncl.edu.tw/handle/8p6y29 Protein Complexes Classification Study Based on Machine Learning Approaches 基於機器學習方法之蛋白質複合體分類研究 Kun-Ting Chao 趙?廷 碩士 國立虎尾科技大學 資訊工程研究所 99 Abstract Protein complexes play important roles in many cellular processes. There are several approaches have been developed for protein complexes prediction; such as (1) using graph theory to study dense protein-protein interaction regions, (2) based on experimental data, such as tandem mass spectrometry, (3) the core attachment approach, and (4) heterogeneity data integration. All of these approaches have certain limitations for these approaches considering only the static, non-biochemical properties of a protein complex. In this thesis, we suggest to integrate various aspects of protein complexes property, i.e. staticas well as the physiochemical properties, and to describe protein complexes.Our method consists of three mainsteps; (i) estimation of parameter values and(ii) major parameters selection, and (iii) validationof classification accuracy.In the parameter estimation step, 27 parameters are considered. Principle component analysis (PCA) and logistic regression (LR) methods are used to determine the major features. In the validation step,major features are extracted from the previous step and are used to construct the feature vectors. After that, they are trained by two machine learning methods, i.e. support vector machines (SVM) and neural network (NN). The 6-fold cross-validation test is performed to investigate the classification accuracy of all major feature subsets. In case of combining Isoelectric point (pI) with GO annotation and sequence similarity features, the result indicates that it can achieve a slightly better classification accuracy. Taking the physiochemical properties for consideration, the present study could possibly improve the accuracy for protein complex prediction tools . chien-hung Huang 黃建宏 2011 學位論文 ; thesis 57 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立虎尾科技大學 === 資訊工程研究所 === 99 === Abstract Protein complexes play important roles in many cellular processes. There are several approaches have been developed for protein complexes prediction; such as (1) using graph theory to study dense protein-protein interaction regions, (2) based on experimental data, such as tandem mass spectrometry, (3) the core attachment approach, and (4) heterogeneity data integration. All of these approaches have certain limitations for these approaches considering only the static, non-biochemical properties of a protein complex. In this thesis, we suggest to integrate various aspects of protein complexes property, i.e. staticas well as the physiochemical properties, and to describe protein complexes.Our method consists of three mainsteps; (i) estimation of parameter values and(ii) major parameters selection, and (iii) validationof classification accuracy.In the parameter estimation step, 27 parameters are considered. Principle component analysis (PCA) and logistic regression (LR) methods are used to determine the major features. In the validation step,major features are extracted from the previous step and are used to construct the feature vectors. After that, they are trained by two machine learning methods, i.e. support vector machines (SVM) and neural network (NN). The 6-fold cross-validation test is performed to investigate the classification accuracy of all major feature subsets. In case of combining Isoelectric point (pI) with GO annotation and sequence similarity features, the result indicates that it can achieve a slightly better classification accuracy. Taking the physiochemical properties for consideration, the present study could possibly improve the accuracy for protein complex prediction tools .
author2 chien-hung Huang
author_facet chien-hung Huang
Kun-Ting Chao
趙?廷
author Kun-Ting Chao
趙?廷
spellingShingle Kun-Ting Chao
趙?廷
Protein Complexes Classification Study Based on Machine Learning Approaches
author_sort Kun-Ting Chao
title Protein Complexes Classification Study Based on Machine Learning Approaches
title_short Protein Complexes Classification Study Based on Machine Learning Approaches
title_full Protein Complexes Classification Study Based on Machine Learning Approaches
title_fullStr Protein Complexes Classification Study Based on Machine Learning Approaches
title_full_unstemmed Protein Complexes Classification Study Based on Machine Learning Approaches
title_sort protein complexes classification study based on machine learning approaches
publishDate 2011
url http://ndltd.ncl.edu.tw/handle/8p6y29
work_keys_str_mv AT kuntingchao proteincomplexesclassificationstudybasedonmachinelearningapproaches
AT zhàotíng proteincomplexesclassificationstudybasedonmachinelearningapproaches
AT kuntingchao jīyújīqìxuéxífāngfǎzhīdànbáizhìfùhétǐfēnlèiyánjiū
AT zhàotíng jīyújīqìxuéxífāngfǎzhīdànbáizhìfùhétǐfēnlèiyánjiū
_version_ 1719254627481812992