On the Acoustics of Emotion in Audio: What Speech, Music and Sound have in Common

Without doubt, there is emotional information in almost any kind of sound received by humans every day: be it the affective state ofa person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affecti...

Full description

Bibliographic Details
Main Authors: Felix eWeninger, Florian eEyben, Björn W. Schuller, Marcello eMortillaro, Klaus R. Scherer
Format: Article
Language:English
Published: Frontiers Media S.A. 2013-05-01
Series:Frontiers in Psychology
Subjects:
Online Access:http://journal.frontiersin.org/Journal/10.3389/fpsyg.2013.00292/full
id doaj-c6ccdad8288b4acd966c6eb2f64cbd26
record_format Article
spelling doaj-c6ccdad8288b4acd966c6eb2f64cbd262020-11-24T22:57:31ZengFrontiers Media S.A.Frontiers in Psychology1664-10782013-05-01410.3389/fpsyg.2013.0029251547On the Acoustics of Emotion in Audio: What Speech, Music and Sound have in CommonFelix eWeninger0Florian eEyben1Björn W. Schuller2Björn W. Schuller3Marcello eMortillaro4Klaus R. Scherer5Technische Universität MünchenTechnische Universität MünchenTechnische Universität MünchenUniversité de GenèveUniversité de GenèveUniversité de GenèveWithout doubt, there is emotional information in almost any kind of sound received by humans every day: be it the affective state ofa person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow’s pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of ’the sound that something makes’, in order to evaluate the systems auditory environment and its own audio output. This article aims at a first step towards a holistic computational model: Starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal and valence regression is feasible achieving significant correlations with the observer annotations of up to .78 for arousal (training on sound and testing on enacted speech) and .60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects.http://journal.frontiersin.org/Journal/10.3389/fpsyg.2013.00292/fullSpeech Perceptionemotion recognitionaudio signal processingmusic perceptionSound perceptionFeature Selection
collection DOAJ
language English
format Article
sources DOAJ
author Felix eWeninger
Florian eEyben
Björn W. Schuller
Björn W. Schuller
Marcello eMortillaro
Klaus R. Scherer
spellingShingle Felix eWeninger
Florian eEyben
Björn W. Schuller
Björn W. Schuller
Marcello eMortillaro
Klaus R. Scherer
On the Acoustics of Emotion in Audio: What Speech, Music and Sound have in Common
Frontiers in Psychology
Speech Perception
emotion recognition
audio signal processing
music perception
Sound perception
Feature Selection
author_facet Felix eWeninger
Florian eEyben
Björn W. Schuller
Björn W. Schuller
Marcello eMortillaro
Klaus R. Scherer
author_sort Felix eWeninger
title On the Acoustics of Emotion in Audio: What Speech, Music and Sound have in Common
title_short On the Acoustics of Emotion in Audio: What Speech, Music and Sound have in Common
title_full On the Acoustics of Emotion in Audio: What Speech, Music and Sound have in Common
title_fullStr On the Acoustics of Emotion in Audio: What Speech, Music and Sound have in Common
title_full_unstemmed On the Acoustics of Emotion in Audio: What Speech, Music and Sound have in Common
title_sort on the acoustics of emotion in audio: what speech, music and sound have in common
publisher Frontiers Media S.A.
series Frontiers in Psychology
issn 1664-1078
publishDate 2013-05-01
description Without doubt, there is emotional information in almost any kind of sound received by humans every day: be it the affective state ofa person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow’s pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of ’the sound that something makes’, in order to evaluate the systems auditory environment and its own audio output. This article aims at a first step towards a holistic computational model: Starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal and valence regression is feasible achieving significant correlations with the observer annotations of up to .78 for arousal (training on sound and testing on enacted speech) and .60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects.
topic Speech Perception
emotion recognition
audio signal processing
music perception
Sound perception
Feature Selection
url http://journal.frontiersin.org/Journal/10.3389/fpsyg.2013.00292/full
work_keys_str_mv AT felixeweninger ontheacousticsofemotioninaudiowhatspeechmusicandsoundhaveincommon
AT florianeeyben ontheacousticsofemotioninaudiowhatspeechmusicandsoundhaveincommon
AT bjornwschuller ontheacousticsofemotioninaudiowhatspeechmusicandsoundhaveincommon
AT bjornwschuller ontheacousticsofemotioninaudiowhatspeechmusicandsoundhaveincommon
AT marcelloemortillaro ontheacousticsofemotioninaudiowhatspeechmusicandsoundhaveincommon
AT klausrscherer ontheacousticsofemotioninaudiowhatspeechmusicandsoundhaveincommon
_version_ 1725650460012969984