Prediction of protein quaternary structural attributes through hybrid feature encoding method by using machine learning approach

碩士 === 國立中興大學 === 生物科技學研究所 === 106 === Predicting their attributes is an essential task in computational biology for the advancement of the proteomics. However, the existing methods did not consider the integration of heterogeneous coding and the accuracy of subunit categories with low data number....

Full description

Bibliographic Details
Main Authors: Yu-Nan Liu, 劉猷楠
Other Authors: Yen-Wei Chu
Format: Others
Language:en_US
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/6b665e
id ndltd-TW-106NCHU5111024
record_format oai_dc
spelling ndltd-TW-106NCHU51110242019-08-15T03:37:47Z http://ndltd.ncl.edu.tw/handle/6b665e Prediction of protein quaternary structural attributes through hybrid feature encoding method by using machine learning approach 利用機器學習方法通過混合特徵編碼方式預測蛋白質四級結構特徵 Yu-Nan Liu 劉猷楠 碩士 國立中興大學 生物科技學研究所 106 Predicting their attributes is an essential task in computational biology for the advancement of the proteomics. However, the existing methods did not consider the integration of heterogeneous coding and the accuracy of subunit categories with low data number. To end this, we proposed a predictive tool which can predicting more than 12 subunit protein oligomers, QUATgo. At the same time, three kinds of sequence coding were used, including dipeptide composition which was first time using to predict protein quaternary structural attributes, protein half-life characteristics and we modified the coding method of the Functional Domain Composition which proposed by the predecessors to solve the problem of large feature vectors. QUATgo solves the problem of insufficient data in a single subunit using a two-stage architecture and uses 10 times cross-validation to test the predictive accuracy of the classifier, the first-stage prediction model uses a random forest algorithm to generate sixteen homologous, heterologous oligomers and monomer respectively. The accuracy of the first-stage classifier is 63.4%. However, the number of training data of the hetero-10mer is insufficient so the training data of the hetero-10mer and the hetero-more than 12mer is regarded as the same category X. If the result of the first stage classifier is class X the sequence will sent to second stage classifier which was constructed with support vector machines, and can the prediction result of the hetero-10mer and hetero-more than 12mer with an accuracy of 97.5%, QUATgo will eventually have 61.4% cross-validation accuracy and 63.4% independent test accuracy. In case study, QUATgo can accurately predicts the variable complex structure of the MERS-CoV ectodomains. Yen-Wei Chu 朱彥煒 2018 學位論文 ; thesis 41 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立中興大學 === 生物科技學研究所 === 106 === Predicting their attributes is an essential task in computational biology for the advancement of the proteomics. However, the existing methods did not consider the integration of heterogeneous coding and the accuracy of subunit categories with low data number. To end this, we proposed a predictive tool which can predicting more than 12 subunit protein oligomers, QUATgo. At the same time, three kinds of sequence coding were used, including dipeptide composition which was first time using to predict protein quaternary structural attributes, protein half-life characteristics and we modified the coding method of the Functional Domain Composition which proposed by the predecessors to solve the problem of large feature vectors. QUATgo solves the problem of insufficient data in a single subunit using a two-stage architecture and uses 10 times cross-validation to test the predictive accuracy of the classifier, the first-stage prediction model uses a random forest algorithm to generate sixteen homologous, heterologous oligomers and monomer respectively. The accuracy of the first-stage classifier is 63.4%. However, the number of training data of the hetero-10mer is insufficient so the training data of the hetero-10mer and the hetero-more than 12mer is regarded as the same category X. If the result of the first stage classifier is class X the sequence will sent to second stage classifier which was constructed with support vector machines, and can the prediction result of the hetero-10mer and hetero-more than 12mer with an accuracy of 97.5%, QUATgo will eventually have 61.4% cross-validation accuracy and 63.4% independent test accuracy. In case study, QUATgo can accurately predicts the variable complex structure of the MERS-CoV ectodomains.
author2 Yen-Wei Chu
author_facet Yen-Wei Chu
Yu-Nan Liu
劉猷楠
author Yu-Nan Liu
劉猷楠
spellingShingle Yu-Nan Liu
劉猷楠
Prediction of protein quaternary structural attributes through hybrid feature encoding method by using machine learning approach
author_sort Yu-Nan Liu
title Prediction of protein quaternary structural attributes through hybrid feature encoding method by using machine learning approach
title_short Prediction of protein quaternary structural attributes through hybrid feature encoding method by using machine learning approach
title_full Prediction of protein quaternary structural attributes through hybrid feature encoding method by using machine learning approach
title_fullStr Prediction of protein quaternary structural attributes through hybrid feature encoding method by using machine learning approach
title_full_unstemmed Prediction of protein quaternary structural attributes through hybrid feature encoding method by using machine learning approach
title_sort prediction of protein quaternary structural attributes through hybrid feature encoding method by using machine learning approach
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/6b665e
work_keys_str_mv AT yunanliu predictionofproteinquaternarystructuralattributesthroughhybridfeatureencodingmethodbyusingmachinelearningapproach
AT liúyóunán predictionofproteinquaternarystructuralattributesthroughhybridfeatureencodingmethodbyusingmachinelearningapproach
AT yunanliu lìyòngjīqìxuéxífāngfǎtōngguòhùnhétèzhēngbiānmǎfāngshìyùcèdànbáizhìsìjíjiégòutèzhēng
AT liúyóunán lìyòngjīqìxuéxífāngfǎtōngguòhùnhétèzhēngbiānmǎfāngshìyùcèdànbáizhìsìjíjiégòutèzhēng
_version_ 1719234743367630848