Excitation Features of Speech for Speaker-Specific Emotion Detection

In this article, we study emotion detection from speech in a speaker-specific scenario. By parameterizing the excitation component of voiced speech, the study explores deviations between emotional speech (e.g., speech produced in anger, happiness, sadness, etc.) and neutral speech (i.e., non-emotion...

Full description

Bibliographic Details
Main Authors:	Sudarsana Reddy Kadiri, Paavo Alku
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Speech analysis paralinguistics emotion detection excitation source zero frequency filtering (ZFF) linear prediction (LP) analysis
Online Access:	https://ieeexplore.ieee.org/document/9046041/

id	doaj-bab65234fb174c14bb51b49f788381a6
record_format	Article
spelling	doaj-bab65234fb174c14bb51b49f788381a62021-03-30T01:30:48ZengIEEEIEEE Access2169-35362020-01-018603826039110.1109/ACCESS.2020.29829549046041Excitation Features of Speech for Speaker-Specific Emotion DetectionSudarsana Reddy Kadiri0https://orcid.org/0000-0001-5806-3053Paavo Alku1Department of Signal Processing and Acoustics, Aalto University, Espoo, FinlandDepartment of Signal Processing and Acoustics, Aalto University, Espoo, FinlandIn this article, we study emotion detection from speech in a speaker-specific scenario. By parameterizing the excitation component of voiced speech, the study explores deviations between emotional speech (e.g., speech produced in anger, happiness, sadness, etc.) and neutral speech (i.e., non-emotional) to develop an automatic emotion detection system. The excitation features used in this study are the instantaneous fundamental frequency, the strength of excitation and the energy of excitation. The Kullback-Leibler (KL) distance is computed to measure the similarity between feature distributions of emotional and neutral speech. Based on the KL distance value between a test utterance and an utterance produced in a neutral state by the same speaker, a detection decision is made by the system. In the training of the proposed system, only three neutral utterances produced by the speaker were used, unlike in most existing emotion recognition and detection systems that call for large amounts of training data (both emotional and neutral) by several speakers. In addition, the proposed system is independent of language or lexical content. The system is evaluated using two databases of emotional speech. The performance of the proposed detection method is shown to be better than that of reference methods.https://ieeexplore.ieee.org/document/9046041/Speech analysisparalinguisticsemotion detectionexcitation sourcezero frequency filtering (ZFF)linear prediction (LP) analysis
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Sudarsana Reddy Kadiri Paavo Alku
spellingShingle	Sudarsana Reddy Kadiri Paavo Alku Excitation Features of Speech for Speaker-Specific Emotion Detection IEEE Access Speech analysis paralinguistics emotion detection excitation source zero frequency filtering (ZFF) linear prediction (LP) analysis
author_facet	Sudarsana Reddy Kadiri Paavo Alku
author_sort	Sudarsana Reddy Kadiri
title	Excitation Features of Speech for Speaker-Specific Emotion Detection
title_short	Excitation Features of Speech for Speaker-Specific Emotion Detection
title_full	Excitation Features of Speech for Speaker-Specific Emotion Detection
title_fullStr	Excitation Features of Speech for Speaker-Specific Emotion Detection
title_full_unstemmed	Excitation Features of Speech for Speaker-Specific Emotion Detection
title_sort	excitation features of speech for speaker-specific emotion detection
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	In this article, we study emotion detection from speech in a speaker-specific scenario. By parameterizing the excitation component of voiced speech, the study explores deviations between emotional speech (e.g., speech produced in anger, happiness, sadness, etc.) and neutral speech (i.e., non-emotional) to develop an automatic emotion detection system. The excitation features used in this study are the instantaneous fundamental frequency, the strength of excitation and the energy of excitation. The Kullback-Leibler (KL) distance is computed to measure the similarity between feature distributions of emotional and neutral speech. Based on the KL distance value between a test utterance and an utterance produced in a neutral state by the same speaker, a detection decision is made by the system. In the training of the proposed system, only three neutral utterances produced by the speaker were used, unlike in most existing emotion recognition and detection systems that call for large amounts of training data (both emotional and neutral) by several speakers. In addition, the proposed system is independent of language or lexical content. The system is evaluated using two databases of emotional speech. The performance of the proposed detection method is shown to be better than that of reference methods.
topic	Speech analysis paralinguistics emotion detection excitation source zero frequency filtering (ZFF) linear prediction (LP) analysis
url	https://ieeexplore.ieee.org/document/9046041/
work_keys_str_mv	AT sudarsanareddykadiri excitationfeaturesofspeechforspeakerspecificemotiondetection AT paavoalku excitationfeaturesofspeechforspeakerspecificemotiondetection
_version_	1724186944387153920

Excitation Features of Speech for Speaker-Specific Emotion Detection

Similar Items