Summary: | 博士 === 國立交通大學 === 統計學研究所 === 96 === This thesis consists of three different researches in the implement of statistical approaches, emotion detection, gene expression clustering and biological pathway reconstruction. In the first research area, we focus on developing an emotion recognition system by the supervised learning. For the importance of communication between human and machine interface, it would be valuable to develop an implement which have the ability to recognize emotion. We propose an approach which can deal with the daily dependence and personal dependence in the data of multiple subjects and samples. Thirty features were extracted from the physiological signals of subject for three statuses of emotion. The physiological signals measured were: electrocardiogram (ECG), skin temperature (SKT) and galvanic skin response (GSR). After removing the daily dependence and subject dependence by the statistical technique of MANOVA, six machine learning including Bayesian network learning, naive Bayesian classification, SVM, decision tree of C4.5, Logistic model and K-nearest-neighbor (KNN) were implement to differentiate the emotional states. The results show that Logistic model gives the best classification accuracy and the statistical technique MANOVA can significant improve the performance of all six machine learning methods in emotion recognition system.
In the second part of this thesis, we explore the expression pattern of yeast genes for diauxic shift in BY and RM strains by Micorarray studies. In particular, we investigate the differential expressed genes between these two strains. After performing gene filtering, cluster analysis and regression model to detect the differential expression patterns of yeast genes for diauxic shift in BY and RM strains, we find a group of genes which have negative correlation in two strains. Besides, the estimated time shifts of expression time profiles in the group are mainly 1 hour before the time that glucose consumption drops. Further analysis such as network analysis could be used to investigate the causal relationship of these interesting genes based on the framework of current result in the future.
Inference of genetic regulatory networks and biological pathways from gene expression patterns is a critical problem in bioinformatics. In the third part of this thesis, we propose using the structure of Time Delay Boolean networks as a tool for exploring biological pathways. We suppose the indegree of each gene (i.e., the number of input genes to each gene) is bounded by a constant K and take K = 2 for the instance of inference. In addition, we consider two kinds of relations between the output gene and the Boolean function with input genes: similarity and prerequisite. In our inference strategy, we compare every output gene and all the pairs of input genes with the eight basic relations and calculate their corresponding p-score. Since we expect that the smaller the p-score, the more likely the relation, we combine those consistent relations and find out the most possible relation between output gene and the pair of input genes. We illustrate the method using a simulated example and a published microarray expression dataset of yeast Saccharomyces cerevisiae from experiments with regulation of gluconeogenesis by Cat8 and Sip4. The results show that our proposed algorithm is extensible for more realistic network models.
|