A Study on Data Fusion Strategy for Audio-Visual Emotion Recognition

博士 === 國立成功大學 === 資訊工程學系 === 102 === Recent years have seen increased attention being given to research topic in automatic audio-visual emotion recognition. To increase the recognition accuracy, data fusion strategy, that is, related to how effectively integrate the audio and visual cues became the...

Full description

Bibliographic Details
Main Authors: Jen-ChunLin, 林仁俊
Other Authors: Chung-Hsien Wu
Format: Others
Language:en_US
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/39485230414627899693
id ndltd-TW-102NCKU5392010
record_format oai_dc
spelling ndltd-TW-102NCKU53920102016-03-07T04:10:55Z http://ndltd.ncl.edu.tw/handle/39485230414627899693 A Study on Data Fusion Strategy for Audio-Visual Emotion Recognition 應用資料融合策略於語音視覺情緒辨識之研究 Jen-ChunLin 林仁俊 博士 國立成功大學 資訊工程學系 102 Recent years have seen increased attention being given to research topic in automatic audio-visual emotion recognition. To increase the recognition accuracy, data fusion strategy, that is, related to how effectively integrate the audio and visual cues became the major research issue. The fusion operations reported can be classified into three major categories: feature-level fusion, decision-level fusion, and model-level fusion for audio-visual emotion recognition. Obviously, the different data fusion strategies have different characteristics and distinct advantages and disadvantages. According to the analysis of characteristics of current data fusion strategies, this dissertation firstly presented a hybrid fusion method to effectively integrate the advantages of data fusion strategies of different characteristics for increasing the recognition performance. This dissertation presented a hybrid fusion method named Error Weighted Semi-Coupled Hidden Markov Model (EWSC-HMM) to effectively integrate the advantages of model-level fusion method Semi-Coupled Hidden Markov Model (SC-HMM) and the decision-level fusion method Error Weighted Classifier Combination (EWC) to obtain the optimal emotion recognition result based on audio-visual bimodal fusion. The state-based bimodal alignment strategy in SC-HMM is proposed to align the temporal relationship between audio and visual streams. The Bayesian classifier weighting scheme EWC is then adopted to explore the contributions of the SC-HMM-based classifiers for different audio-visual feature pairs to make a final emotion recognition decision. For performance evaluation, two databases are considered: the posed MHMC database and the spontaneous SEMAINE database. Experimental results show that the proposed method not only outperforms other fusion-based bimodal emotion recognition methods for posed expressions but also provide acceptable results for spontaneous expressions. A complete emotional expression typically contains a complex temporal course in face-to-face natural conversation. In this dissertation, we further focused on exploring the temporal evolution of an emotional expression for audio-visual emotion recognition. Previous psychologist research showed that a complete emotional expression can be characterized in three sequential temporal phases: onset (application), apex (release), and offset (relaxation), when considering the manner and intensity of expression. However, a complete emotional expression is expressed by more than one utterance in natural conversation, and in more detail, each utterance may contain several temporal phases of emotional expression. Accordingly, this dissertation further presented a novel data fusion method with respect to the temporal course modeling scheme named Two-Level Hierarchical Alignment-Based Semi-Coupled Hidden Markov model (2H-SC-HMM) to effectively solve the problem of complex temporal structures of an emotional expression and consider the temporal relationship between audio and visual streams for increasing the performance of audio-visual emotion recognition in a conversational utterance. Finally, the experimental results demonstrate that the proposed 2H-SC-HMM substantially improves apparent performance of audio-visual emotion recognition. Chung-Hsien Wu 吳宗憲 2014 學位論文 ; thesis 94 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立成功大學 === 資訊工程學系 === 102 === Recent years have seen increased attention being given to research topic in automatic audio-visual emotion recognition. To increase the recognition accuracy, data fusion strategy, that is, related to how effectively integrate the audio and visual cues became the major research issue. The fusion operations reported can be classified into three major categories: feature-level fusion, decision-level fusion, and model-level fusion for audio-visual emotion recognition. Obviously, the different data fusion strategies have different characteristics and distinct advantages and disadvantages. According to the analysis of characteristics of current data fusion strategies, this dissertation firstly presented a hybrid fusion method to effectively integrate the advantages of data fusion strategies of different characteristics for increasing the recognition performance. This dissertation presented a hybrid fusion method named Error Weighted Semi-Coupled Hidden Markov Model (EWSC-HMM) to effectively integrate the advantages of model-level fusion method Semi-Coupled Hidden Markov Model (SC-HMM) and the decision-level fusion method Error Weighted Classifier Combination (EWC) to obtain the optimal emotion recognition result based on audio-visual bimodal fusion. The state-based bimodal alignment strategy in SC-HMM is proposed to align the temporal relationship between audio and visual streams. The Bayesian classifier weighting scheme EWC is then adopted to explore the contributions of the SC-HMM-based classifiers for different audio-visual feature pairs to make a final emotion recognition decision. For performance evaluation, two databases are considered: the posed MHMC database and the spontaneous SEMAINE database. Experimental results show that the proposed method not only outperforms other fusion-based bimodal emotion recognition methods for posed expressions but also provide acceptable results for spontaneous expressions. A complete emotional expression typically contains a complex temporal course in face-to-face natural conversation. In this dissertation, we further focused on exploring the temporal evolution of an emotional expression for audio-visual emotion recognition. Previous psychologist research showed that a complete emotional expression can be characterized in three sequential temporal phases: onset (application), apex (release), and offset (relaxation), when considering the manner and intensity of expression. However, a complete emotional expression is expressed by more than one utterance in natural conversation, and in more detail, each utterance may contain several temporal phases of emotional expression. Accordingly, this dissertation further presented a novel data fusion method with respect to the temporal course modeling scheme named Two-Level Hierarchical Alignment-Based Semi-Coupled Hidden Markov model (2H-SC-HMM) to effectively solve the problem of complex temporal structures of an emotional expression and consider the temporal relationship between audio and visual streams for increasing the performance of audio-visual emotion recognition in a conversational utterance. Finally, the experimental results demonstrate that the proposed 2H-SC-HMM substantially improves apparent performance of audio-visual emotion recognition.
author2 Chung-Hsien Wu
author_facet Chung-Hsien Wu
Jen-ChunLin
林仁俊
author Jen-ChunLin
林仁俊
spellingShingle Jen-ChunLin
林仁俊
A Study on Data Fusion Strategy for Audio-Visual Emotion Recognition
author_sort Jen-ChunLin
title A Study on Data Fusion Strategy for Audio-Visual Emotion Recognition
title_short A Study on Data Fusion Strategy for Audio-Visual Emotion Recognition
title_full A Study on Data Fusion Strategy for Audio-Visual Emotion Recognition
title_fullStr A Study on Data Fusion Strategy for Audio-Visual Emotion Recognition
title_full_unstemmed A Study on Data Fusion Strategy for Audio-Visual Emotion Recognition
title_sort study on data fusion strategy for audio-visual emotion recognition
publishDate 2014
url http://ndltd.ncl.edu.tw/handle/39485230414627899693
work_keys_str_mv AT jenchunlin astudyondatafusionstrategyforaudiovisualemotionrecognition
AT línrénjùn astudyondatafusionstrategyforaudiovisualemotionrecognition
AT jenchunlin yīngyòngzīliàorónghécèlüèyúyǔyīnshìjuéqíngxùbiànshízhīyánjiū
AT línrénjùn yīngyòngzīliàorónghécèlüèyúyǔyīnshìjuéqíngxùbiànshízhīyánjiū
AT jenchunlin studyondatafusionstrategyforaudiovisualemotionrecognition
AT línrénjùn studyondatafusionstrategyforaudiovisualemotionrecognition
_version_ 1718199059310706688