A STUDY OF EMOTION RECOGNITION ON MANDARIN SPEECH AND ITS PERFORMANCE EVALUATION

博士 === 大同大學 === 資訊工程學系(所) === 96 === It is said that technology comes out from humanity. What is humanity? The very definition of humanity is emotion. Emotion is the basis for all human expression and the underlying theme behind everything that is done, said, thought or imagined. Making computers be...

Full description

Bibliographic Details
Main Authors: Yu-te Chen, 陳育得
Other Authors: Tsang-long Pao
Format: Others
Language:en_US
Published: 2008
Online Access:http://ndltd.ncl.edu.tw/handle/73727768030935807856
Description
Summary:博士 === 大同大學 === 資訊工程學系(所) === 96 === It is said that technology comes out from humanity. What is humanity? The very definition of humanity is emotion. Emotion is the basis for all human expression and the underlying theme behind everything that is done, said, thought or imagined. Making computers being able to perceive and respond to human emotion, the human-computer interaction will be more natural. In the past, several classifiers were adopted independently and tested on several emotional speech corpora with different language, size, number of emotional states and recording method. This makes it difficult to compare and evaluate the performance of those classifiers. In this thesis, we proposed a weighted discrete k-nearest neighborhood (WD-KNN) classification algorithm and compared it with several classification methods to evaluate their performance by applying them to the same Mandarin emotional speech corpus. We first implemented a baseline system to determine the parameter k in KNN based classifiers and to select the best feature set. The results of different values of k in KNN classifier showed that the best performance 70.7% is achieved when the value of k is set to 10. To be fair in the comparison of the experiments, k is set to 10 in the KNN-based classifiers throughout this thesis. The best feature set includes LPC, LPCC, and MFCC. Compared to the performance before feature selection, the accuracy is improved 2.1% as the number of feature types are eliminated from 13 to 3. Next, we focused on comparison of different weighting schemes on KNN-based classifiers, including traditional K-Nearest Neighborhood (KNN), weighted KNN (WKNN), KNN classification using Categorical Average Patterns (WCAP), and WD-KNN. Compared to the baseline performance, the largest accuracy improvement of 4.9%, 2.8% and 12.3% can be achieved in these classifiers. The highest recognition rate is 81.4% with WD-KNN classifier weighted by Fibonacci sequence. Then we evaluated the performance of several classifiers, including KNN, MKNN, WKNN, LDA, QDA, GMM, HMM, SVM, BPNN, and the proposed WD-KNN, for detecting emotion from Mandarin speech. The results of experiments and McNemar’s test show that the proposed WD-KNN classifier achieves best accuracy for the 5-class emotion recognition and outperforms other classification techniques. Then, to verify the advantage of the proposed method, we compared these classifiers by applying them to another Mandarin expressive speech corpus consisting of two emotions and 2000 utterances. The experimental results still show that the proposed WD-KNN outperforms others. Finally, we implemented an emotion radar chart which is based on WD-KNN and can present the intensity of each emotion component in the speech in our emotion recognition system. Such system can be further used in speech training, especially for hearing-impaired to learn how to express emotions in speech more naturally.