A Study on Data Fusion Strategy for Audio-Visual Emotion Recognition

博士 === 國立成功大學 === 資訊工程學系 === 102 === Recent years have seen increased attention being given to research topic in automatic audio-visual emotion recognition. To increase the recognition accuracy, data fusion strategy, that is, related to how effectively integrate the audio and visual cues became the...

Full description

Bibliographic Details
Main Authors:	Jen-ChunLin, 林仁俊
Other Authors:	Chung-Hsien Wu
Format:	Others
Language:	en_US
Published:	2014
Online Access:	http://ndltd.ncl.edu.tw/handle/39485230414627899693

id	ndltd-TW-102NCKU5392010
record_format	oai_dc
spelling	ndltd-TW-102NCKU53920102016-03-07T04:10:55Z http://ndltd.ncl.edu.tw/handle/39485230414627899693 A Study on Data Fusion Strategy for Audio-Visual Emotion Recognition 應用資料融合策略於語音視覺情緒辨識之研究 Jen-ChunLin 林仁俊博士國立成功大學資訊工程學系 102 Recent years have seen increased attention being given to research topic in automatic audio-visual emotion recognition. To increase the recognition accuracy, data fusion strategy, that is, related to how effectively integrate the audio and visual cues became the major research issue. The fusion operations reported can be classified into three major categories: feature-level fusion, decision-level fusion, and model-level fusion for audio-visual emotion recognition. Obviously, the different data fusion strategies have different characteristics and distinct advantages and disadvantages. According to the analysis of characteristics of current data fusion strategies, this dissertation firstly presented a hybrid fusion method to effectively integrate the advantages of data fusion strategies of different characteristics for increasing the recognition performance. This dissertation presented a hybrid fusion method named Error Weighted Semi-Coupled Hidden Markov Model (EWSC-HMM) to effectively integrate the advantages of model-level fusion method Semi-Coupled Hidden Markov Model (SC-HMM) and the decision-level fusion method Error Weighted Classifier Combination (EWC) to obtain the optimal emotion recognition result based on audio-visual bimodal fusion. The state-based bimodal alignment strategy in SC-HMM is proposed to align the temporal relationship between audio and visual streams. The Bayesian classifier weighting scheme EWC is then adopted to explore the contributions of the SC-HMM-based classifiers for different audio-visual feature pairs to make a final emotion recognition decision. For performance evaluation, two databases are considered: the posed MHMC database and the spontaneous SEMAINE database. Experimental results show that the proposed method not only outperforms other fusion-based bimodal emotion recognition methods for posed expressions but also provide acceptable results for spontaneous expressions. A complete emotional expression typically contains a complex temporal course in face-to-face natural conversation. In this dissertation, we further focused on exploring the temporal evolution of an emotional expression for audio-visual emotion recognition. Previous psychologist research showed that a complete emotional expression can be characterized in three sequential temporal phases: onset (application), apex (release), and offset (relaxation), when considering the manner and intensity of expression. However, a complete emotional expression is expressed by more than one utterance in natural conversation, and in more detail, each utterance may contain several temporal phases of emotional expression. Accordingly, this dissertation further presented a novel data fusion method with respect to the temporal course modeling scheme named Two-Level Hierarchical Alignment-Based Semi-Coupled Hidden Markov model (2H-SC-HMM) to effectively solve the problem of complex temporal structures of an emotional expression and consider the temporal relationship between audio and visual streams for increasing the performance of audio-visual emotion recognition in a conversational utterance. Finally, the experimental results demonstrate that the proposed 2H-SC-HMM substantially improves apparent performance of audio-visual emotion recognition. Chung-Hsien Wu 吳宗憲 2014 學位論文 ; thesis 94 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	博士 === 國立成功大學 === 資訊工程學系 === 102 === Recent years have seen increased attention being given to research topic in automatic audio-visual emotion recognition. To increase the recognition accuracy, data fusion strategy, that is, related to how effectively integrate the audio and visual cues became the major research issue. The fusion operations reported can be classified into three major categories: feature-level fusion, decision-level fusion, and model-level fusion for audio-visual emotion recognition. Obviously, the different data fusion strategies have different characteristics and distinct advantages and disadvantages. According to the analysis of characteristics of current data fusion strategies, this dissertation firstly presented a hybrid fusion method to effectively integrate the advantages of data fusion strategies of different characteristics for increasing the recognition performance. This dissertation presented a hybrid fusion method named Error Weighted Semi-Coupled Hidden Markov Model (EWSC-HMM) to effectively integrate the advantages of model-level fusion method Semi-Coupled Hidden Markov Model (SC-HMM) and the decision-level fusion method Error Weighted Classifier Combination (EWC) to obtain the optimal emotion recognition result based on audio-visual bimodal fusion. The state-based bimodal alignment strategy in SC-HMM is proposed to align the temporal relationship between audio and visual streams. The Bayesian classifier weighting scheme EWC is then adopted to explore the contributions of the SC-HMM-based classifiers for different audio-visual feature pairs to make a final emotion recognition decision. For performance evaluation, two databases are considered: the posed MHMC database and the spontaneous SEMAINE database. Experimental results show that the proposed method not only outperforms other fusion-based bimodal emotion recognition methods for posed expressions but also provide acceptable results for spontaneous expressions. A complete emotional expression typically contains a complex temporal course in face-to-face natural conversation. In this dissertation, we further focused on exploring the temporal evolution of an emotional expression for audio-visual emotion recognition. Previous psychologist research showed that a complete emotional expression can be characterized in three sequential temporal phases: onset (application), apex (release), and offset (relaxation), when considering the manner and intensity of expression. However, a complete emotional expression is expressed by more than one utterance in natural conversation, and in more detail, each utterance may contain several temporal phases of emotional expression. Accordingly, this dissertation further presented a novel data fusion method with respect to the temporal course modeling scheme named Two-Level Hierarchical Alignment-Based Semi-Coupled Hidden Markov model (2H-SC-HMM) to effectively solve the problem of complex temporal structures of an emotional expression and consider the temporal relationship between audio and visual streams for increasing the performance of audio-visual emotion recognition in a conversational utterance. Finally, the experimental results demonstrate that the proposed 2H-SC-HMM substantially improves apparent performance of audio-visual emotion recognition.
author2	Chung-Hsien Wu
author_facet	Chung-Hsien Wu Jen-ChunLin 林仁俊
author	Jen-ChunLin 林仁俊
spellingShingle	Jen-ChunLin 林仁俊 A Study on Data Fusion Strategy for Audio-Visual Emotion Recognition
author_sort	Jen-ChunLin
title	A Study on Data Fusion Strategy for Audio-Visual Emotion Recognition
title_short	A Study on Data Fusion Strategy for Audio-Visual Emotion Recognition
title_full	A Study on Data Fusion Strategy for Audio-Visual Emotion Recognition
title_fullStr	A Study on Data Fusion Strategy for Audio-Visual Emotion Recognition
title_full_unstemmed	A Study on Data Fusion Strategy for Audio-Visual Emotion Recognition
title_sort	study on data fusion strategy for audio-visual emotion recognition
publishDate	2014
url	http://ndltd.ncl.edu.tw/handle/39485230414627899693
work_keys_str_mv	AT jenchunlin astudyondatafusionstrategyforaudiovisualemotionrecognition AT línrénjùn astudyondatafusionstrategyforaudiovisualemotionrecognition AT jenchunlin yīngyòngzīliàorónghécèlüèyúyǔyīnshìjuéqíngxùbiànshízhīyánjiū AT línrénjùn yīngyòngzīliàorónghécèlüèyúyǔyīnshìjuéqíngxùbiànshízhīyánjiū AT jenchunlin studyondatafusionstrategyforaudiovisualemotionrecognition AT línrénjùn studyondatafusionstrategyforaudiovisualemotionrecognition
_version_	1718199059310706688

A Study on Data Fusion Strategy for Audio-Visual Emotion Recognition

Similar Items