A combined cepstral distance method for emotional speech recognition

Affective computing is not only the direction of reform in artificial intelligence but also exemplification of the advanced intelligent machines. Emotion is the biggest difference between human and machine. If the machine behaves with emotion, then the machine will be accepted by more people. Voice...

Full description

Bibliographic Details
Main Authors: Changqin Quan, Bin Zhang, Xiao Sun, Fuji Ren
Format: Article
Language:English
Published: SAGE Publishing 2017-07-01
Series:International Journal of Advanced Robotic Systems
Online Access:https://doi.org/10.1177/1729881417719836
id doaj-480f1927402449e5966aec246e3c728f
record_format Article
spelling doaj-480f1927402449e5966aec246e3c728f2020-11-25T03:42:55ZengSAGE PublishingInternational Journal of Advanced Robotic Systems1729-88142017-07-011410.1177/1729881417719836A combined cepstral distance method for emotional speech recognitionChangqin Quan0Bin Zhang1Xiao Sun2Fuji Ren3 Graduate School of System Informatics, Kobe University, Kobe, Japan Department of Computer and Information Science, Hefei University of Technology, Hefei, China Department of Computer and Information Science, Hefei University of Technology, Hefei, China Faculty of Engineering, University of Tokushima, JapanAffective computing is not only the direction of reform in artificial intelligence but also exemplification of the advanced intelligent machines. Emotion is the biggest difference between human and machine. If the machine behaves with emotion, then the machine will be accepted by more people. Voice is the most natural and can be easily understood and accepted manner in daily communication. The recognition of emotional voice is an important field of artificial intelligence. However, in recognition of emotions, there often exists the phenomenon that two emotions are particularly vulnerable to confusion. This article presents a combined cepstral distance method in two-group multi-class emotion classification for emotional speech recognition. Cepstral distance combined with speech energy is well used as speech signal endpoint detection in speech recognition. In this work, the use of cepstral distance aims to measure the similarity between frames in emotional signals and in neutral signals. These features are input for directed acyclic graph support vector machine classification. Finally, a two-group classification strategy is adopted to solve confusion in multi-emotion recognition. In the experiments, Chinese mandarin emotion database is used and a large training set (1134 + 378 utterances) ensures a powerful modelling capability for predicting emotion. The experimental results show that cepstral distance increases the recognition rate of emotion sad and can balance the recognition results with eliminating the over fitting. And for the German corpus Berlin emotional speech database, the recognition rate between sad and boring, which are very difficult to distinguish, is up to 95.45%.https://doi.org/10.1177/1729881417719836
collection DOAJ
language English
format Article
sources DOAJ
author Changqin Quan
Bin Zhang
Xiao Sun
Fuji Ren
spellingShingle Changqin Quan
Bin Zhang
Xiao Sun
Fuji Ren
A combined cepstral distance method for emotional speech recognition
International Journal of Advanced Robotic Systems
author_facet Changqin Quan
Bin Zhang
Xiao Sun
Fuji Ren
author_sort Changqin Quan
title A combined cepstral distance method for emotional speech recognition
title_short A combined cepstral distance method for emotional speech recognition
title_full A combined cepstral distance method for emotional speech recognition
title_fullStr A combined cepstral distance method for emotional speech recognition
title_full_unstemmed A combined cepstral distance method for emotional speech recognition
title_sort combined cepstral distance method for emotional speech recognition
publisher SAGE Publishing
series International Journal of Advanced Robotic Systems
issn 1729-8814
publishDate 2017-07-01
description Affective computing is not only the direction of reform in artificial intelligence but also exemplification of the advanced intelligent machines. Emotion is the biggest difference between human and machine. If the machine behaves with emotion, then the machine will be accepted by more people. Voice is the most natural and can be easily understood and accepted manner in daily communication. The recognition of emotional voice is an important field of artificial intelligence. However, in recognition of emotions, there often exists the phenomenon that two emotions are particularly vulnerable to confusion. This article presents a combined cepstral distance method in two-group multi-class emotion classification for emotional speech recognition. Cepstral distance combined with speech energy is well used as speech signal endpoint detection in speech recognition. In this work, the use of cepstral distance aims to measure the similarity between frames in emotional signals and in neutral signals. These features are input for directed acyclic graph support vector machine classification. Finally, a two-group classification strategy is adopted to solve confusion in multi-emotion recognition. In the experiments, Chinese mandarin emotion database is used and a large training set (1134 + 378 utterances) ensures a powerful modelling capability for predicting emotion. The experimental results show that cepstral distance increases the recognition rate of emotion sad and can balance the recognition results with eliminating the over fitting. And for the German corpus Berlin emotional speech database, the recognition rate between sad and boring, which are very difficult to distinguish, is up to 95.45%.
url https://doi.org/10.1177/1729881417719836
work_keys_str_mv AT changqinquan acombinedcepstraldistancemethodforemotionalspeechrecognition
AT binzhang acombinedcepstraldistancemethodforemotionalspeechrecognition
AT xiaosun acombinedcepstraldistancemethodforemotionalspeechrecognition
AT fujiren acombinedcepstraldistancemethodforemotionalspeechrecognition
AT changqinquan combinedcepstraldistancemethodforemotionalspeechrecognition
AT binzhang combinedcepstraldistancemethodforemotionalspeechrecognition
AT xiaosun combinedcepstraldistancemethodforemotionalspeechrecognition
AT fujiren combinedcepstraldistancemethodforemotionalspeechrecognition
_version_ 1724522582572531712