Sound-source recognition : a theory and computational model

Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999. === Includes bibliographical references (p. 159-172). === The ability of a normal human listener to recognize objects in the environment from only the sounds they produce is extraordina...

Full description

Bibliographic Details
Main Author:	Martin, Keith Dana
Other Authors:	Barry L. Vercoe.
Format:	Others
Language:	English
Published:	Massachusetts Institute of Technology 2005
Subjects:	Electrical Engineering and Computer Science
Online Access:	http://hdl.handle.net/1721.1/9468

id	ndltd-MIT-oai-dspace.mit.edu-1721.1-9468
record_format	oai_dc
spelling	ndltd-MIT-oai-dspace.mit.edu-1721.1-94682019-08-17T03:11:09Z Sound-source recognition : a theory and computational model Martin, Keith Dana Barry L. Vercoe. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999. Includes bibliographical references (p. 159-172). The ability of a normal human listener to recognize objects in the environment from only the sounds they produce is extraordinarily robust with regard to characteristics of the acoustic environment and of other competing sound sources. In contrast, computer systems designed to recognize sound sources function precariously, breaking down whenever the target sound is degraded by reverberation, noise, or competing sounds. Robust listening requires extensive contextual knowledge, but the potential contribution of sound-source recognition to the process of auditory scene analysis has largely been neglected by researchers building computational models of the scene analysis process. This thesis proposes a theory of sound-source recognition, casting recognition as a process of gathering information to enable the listener to make inferences about objects in the environment or to predict their behavior. In order to explore the process, attention is restricted to isolated sounds produced by a small class of sound sources, the non-percussive orchestral musical instruments. Previous research on the perception and production of orchestral instrument sounds is reviewed from a vantage point based on the excitation and resonance structure of the sound-production process, revealing a set of perceptually salient acoustic features. A computer model of the recognition process is developed that is capable of "listening" to a recording of a musical instrument and classifying the instrument as one of 25 possibilities. The model is based on current models of signal processing in the human auditory system. It explicitly extracts salient acoustic features and uses a novel improvisational taxonomic architecture (based on simple statistical pattern-recognition techniques) to classify the sound source. The performance of the model is compared directly to that of skilled human listeners, using both isolated musical tones and excerpts from compact disc recordings as test stimuli. The computer model's performance is robust with regard to the variations of reverberation and ambient noise (although not with regard to competing sound sources) in commercial compact disc recordings, and the system performs better than three out of fourteen skilled human listeners on a forced-choice classification task. This work has implications for research in musical timbre, automatic media annotation, human talker identification, and computational auditory scene analysis. by Keith Dana Martin. Ph.D. 2005-08-22T18:36:20Z 2005-08-22T18:36:20Z 1999 1999 Thesis http://hdl.handle.net/1721.1/9468 43551518 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 172 p. 14078283 bytes 14078038 bytes application/pdf application/pdf application/pdf Massachusetts Institute of Technology
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Electrical Engineering and Computer Science
spellingShingle	Electrical Engineering and Computer Science Martin, Keith Dana Sound-source recognition : a theory and computational model
description	Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999. === Includes bibliographical references (p. 159-172). === The ability of a normal human listener to recognize objects in the environment from only the sounds they produce is extraordinarily robust with regard to characteristics of the acoustic environment and of other competing sound sources. In contrast, computer systems designed to recognize sound sources function precariously, breaking down whenever the target sound is degraded by reverberation, noise, or competing sounds. Robust listening requires extensive contextual knowledge, but the potential contribution of sound-source recognition to the process of auditory scene analysis has largely been neglected by researchers building computational models of the scene analysis process. This thesis proposes a theory of sound-source recognition, casting recognition as a process of gathering information to enable the listener to make inferences about objects in the environment or to predict their behavior. In order to explore the process, attention is restricted to isolated sounds produced by a small class of sound sources, the non-percussive orchestral musical instruments. Previous research on the perception and production of orchestral instrument sounds is reviewed from a vantage point based on the excitation and resonance structure of the sound-production process, revealing a set of perceptually salient acoustic features. A computer model of the recognition process is developed that is capable of "listening" to a recording of a musical instrument and classifying the instrument as one of 25 possibilities. The model is based on current models of signal processing in the human auditory system. It explicitly extracts salient acoustic features and uses a novel improvisational taxonomic architecture (based on simple statistical pattern-recognition techniques) to classify the sound source. The performance of the model is compared directly to that of skilled human listeners, using both isolated musical tones and excerpts from compact disc recordings as test stimuli. The computer model's performance is robust with regard to the variations of reverberation and ambient noise (although not with regard to competing sound sources) in commercial compact disc recordings, and the system performs better than three out of fourteen skilled human listeners on a forced-choice classification task. This work has implications for research in musical timbre, automatic media annotation, human talker identification, and computational auditory scene analysis. === by Keith Dana Martin. === Ph.D.
author2	Barry L. Vercoe.
author_facet	Barry L. Vercoe. Martin, Keith Dana
author	Martin, Keith Dana
author_sort	Martin, Keith Dana
title	Sound-source recognition : a theory and computational model
title_short	Sound-source recognition : a theory and computational model
title_full	Sound-source recognition : a theory and computational model
title_fullStr	Sound-source recognition : a theory and computational model
title_full_unstemmed	Sound-source recognition : a theory and computational model
title_sort	sound-source recognition : a theory and computational model
publisher	Massachusetts Institute of Technology
publishDate	2005
url	http://hdl.handle.net/1721.1/9468
work_keys_str_mv	AT martinkeithdana soundsourcerecognitionatheoryandcomputationalmodel
_version_	1719235081668657152

Sound-source recognition : a theory and computational model

Similar Items