Modeling Timbre Similarity of Short Music Clips

There is evidence from a number of recent studies that most listeners are able to extract information related to song identity, emotion, or genre from music excerpts with durations in the range of tenths of seconds. Because of these very short durations, timbre as a multifaceted auditory attribute a...

Full description

Bibliographic Details
Main Authors:	Kai Siedenburg, Daniel Müllensiefen
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2017-04-01
Series:	Frontiers in Psychology
Subjects:	short audio clips music similarity timbre audio features genre
Online Access:	http://journal.frontiersin.org/article/10.3389/fpsyg.2017.00639/full

id	doaj-f72f9c48d8bd4c499e116f06fbdf9a9e
record_format	Article
spelling	doaj-f72f9c48d8bd4c499e116f06fbdf9a9e2020-11-24T21:22:09ZengFrontiers Media S.A.Frontiers in Psychology1664-10782017-04-01810.3389/fpsyg.2017.00639238531Modeling Timbre Similarity of Short Music ClipsKai Siedenburg0Daniel Müllensiefen1Department of Medical Physics and Acoustics, Carl von Ossietzky University of OldenburgOldenburg, GermanyDepartment of Psychology, Goldsmiths University of LondonLondon, UKThere is evidence from a number of recent studies that most listeners are able to extract information related to song identity, emotion, or genre from music excerpts with durations in the range of tenths of seconds. Because of these very short durations, timbre as a multifaceted auditory attribute appears as a plausible candidate for the type of features that listeners make use of when processing short music excerpts. However, the importance of timbre in listening tasks that involve short excerpts has not yet been demonstrated empirically. Hence, the goal of this study was to develop a method that allows to explore to what degree similarity judgments of short music clips can be modeled with low-level acoustic features related to timbre. We utilized the similarity data from two large samples of participants: Sample I was obtained via an online survey, used 16 clips of 400 ms length, and contained responses of 137,339 participants. Sample II was collected in a lab environment, used 16 clips of 800 ms length, and contained responses from 648 participants. Our model used two sets of audio features which included commonly used timbre descriptors and the well-known Mel-frequency cepstral coefficients as well as their temporal derivates. In order to predict pairwise similarities, the resulting distances between clips in terms of their audio features were used as predictor variables with partial least-squares regression. We found that a sparse selection of three to seven features from both descriptor sets—mainly encoding the coarse shape of the spectrum as well as spectrotemporal variability—best predicted similarities across the two sets of sounds. Notably, the inclusion of non-acoustic predictors of musical genre and record release date allowed much better generalization performance and explained up to 50% of shared variance (R2) between observations and model predictions. Overall, the results of this study empirically demonstrate that both acoustic features related to timbre as well as higher level categorical features such as musical genre play a major role in the perception of short music clips.http://journal.frontiersin.org/article/10.3389/fpsyg.2017.00639/fullshort audio clipsmusic similaritytimbreaudio featuresgenre
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Kai Siedenburg Daniel Müllensiefen
spellingShingle	Kai Siedenburg Daniel Müllensiefen Modeling Timbre Similarity of Short Music Clips Frontiers in Psychology short audio clips music similarity timbre audio features genre
author_facet	Kai Siedenburg Daniel Müllensiefen
author_sort	Kai Siedenburg
title	Modeling Timbre Similarity of Short Music Clips
title_short	Modeling Timbre Similarity of Short Music Clips
title_full	Modeling Timbre Similarity of Short Music Clips
title_fullStr	Modeling Timbre Similarity of Short Music Clips
title_full_unstemmed	Modeling Timbre Similarity of Short Music Clips
title_sort	modeling timbre similarity of short music clips
publisher	Frontiers Media S.A.
series	Frontiers in Psychology
issn	1664-1078
publishDate	2017-04-01
description	There is evidence from a number of recent studies that most listeners are able to extract information related to song identity, emotion, or genre from music excerpts with durations in the range of tenths of seconds. Because of these very short durations, timbre as a multifaceted auditory attribute appears as a plausible candidate for the type of features that listeners make use of when processing short music excerpts. However, the importance of timbre in listening tasks that involve short excerpts has not yet been demonstrated empirically. Hence, the goal of this study was to develop a method that allows to explore to what degree similarity judgments of short music clips can be modeled with low-level acoustic features related to timbre. We utilized the similarity data from two large samples of participants: Sample I was obtained via an online survey, used 16 clips of 400 ms length, and contained responses of 137,339 participants. Sample II was collected in a lab environment, used 16 clips of 800 ms length, and contained responses from 648 participants. Our model used two sets of audio features which included commonly used timbre descriptors and the well-known Mel-frequency cepstral coefficients as well as their temporal derivates. In order to predict pairwise similarities, the resulting distances between clips in terms of their audio features were used as predictor variables with partial least-squares regression. We found that a sparse selection of three to seven features from both descriptor sets—mainly encoding the coarse shape of the spectrum as well as spectrotemporal variability—best predicted similarities across the two sets of sounds. Notably, the inclusion of non-acoustic predictors of musical genre and record release date allowed much better generalization performance and explained up to 50% of shared variance (R2) between observations and model predictions. Overall, the results of this study empirically demonstrate that both acoustic features related to timbre as well as higher level categorical features such as musical genre play a major role in the perception of short music clips.
topic	short audio clips music similarity timbre audio features genre
url	http://journal.frontiersin.org/article/10.3389/fpsyg.2017.00639/full
work_keys_str_mv	AT kaisiedenburg modelingtimbresimilarityofshortmusicclips AT danielmullensiefen modelingtimbresimilarityofshortmusicclips
_version_	1725997280090128384

Modeling Timbre Similarity of Short Music Clips

Similar Items