Sequence-Based Prediction of Mutation-Induced Stability Changes with different temperature models

碩士 === 國立中興大學 === 基因體暨生物資訊學研究所 === 107 === The structure of proteins is highly correlated with protein function. When a single point mutation occurs on an amino acid residue, it may have a serious effect on the entire protein structure, leading to a change or loss of function. The potential applicat...

Full description

Bibliographic Details
Main Authors: Guan-Lin Huang, 黃冠霖
Other Authors: Yen-Wei Chu
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/dayct4
Description
Summary:碩士 === 國立中興大學 === 基因體暨生物資訊學研究所 === 107 === The structure of proteins is highly correlated with protein function. When a single point mutation occurs on an amino acid residue, it may have a serious effect on the entire protein structure, leading to a change or loss of function. The potential applications of protein stability are very broad, such as increasing protein activity, studying the structural properties of protein interaction sites, and drug development. However, the previous prediction tools often were based on structures as features, but most of the proteins currently only have primary sequence information. Amino acid single-point mutations can also change the stability of protein structure by folding to produce a small change in free energy (ΔG, dG), and the free energy of different folding between the general protein and the mutant protein (ΔΔG, ddG) often are used as an important factor in the stability of protein stability. This study proposes a sequence-based predictive tool that is more accurate than previous tools. It constructs three different models based on temperature differences, namely low temperature model, general temperature model, and high temperature model. The basis feature, sequence feature, structural feature, and function feature of the protein, and the XGboost (Extreme Gradient Boosting) machine learning method, and the prediction accuracy of the 10-fold cross-validation and independent test were 0.739, 0.808, and 0.979, respectively. Our tool performs better than other tools that were based on sequences, even better than most tools that were based on structures.