Pitch tracking and speech enhancement in noisy and reverberant environments

Bibliographic Details
Main Author:	Wu, Mingyang
Language:	English
Published:	The Ohio State University / OhioLINK 2003
Subjects:	Artificial Intelligence channel selection correlogram hidden Markov model (HMM) multipitch tracking noisy speech pitch detection pitch strength reverberant speech inverse filtering signal-to-reverberant energy ratio (SRR) reverberation reverberation time
Online Access:	http://rave.ohiolink.edu/etdc/view?acc_num=osu1064341479

id	ndltd-OhioLink-oai-etd.ohiolink.edu-osu1064341479
record_format	oai_dc
spelling	ndltd-OhioLink-oai-etd.ohiolink.edu-osu10643414792021-08-03T05:48:25Z Pitch tracking and speech enhancement in noisy and reverberant environments Wu, Mingyang Artificial Intelligence channel selection correlogram hidden Markov model (HMM) multipitch tracking noisy speech pitch detection pitch strength reverberant speech inverse filtering signal-to-reverberant energy ratio (SRR) reverberation reverberation time Two causes of speech degradation exist in practically all listening situations: noise interference and room reverberation. This dissertation investigates three particular aspects of speech processing in noisy and reverberant environments: multipitch tracking for noisy speech, measurement of reverberation time based on pitch strength, and reverberant speech enhancement using one microphone (or monaurally). An effective multipitch tracking algorithm for noisy speech is critical for speech analysis and processing. However, the performance of existing algorithms is not satisfactory. We present a robust algorithm for multipitch tracking of noisy speech. Our approach integrates an improved channel and peak selection method, a new method for extracting periodicity information across different channels, and a hidden Markov model (HMM) for forming continuous pitch tracks. The resulting algorithm can reliably track single and double pitch tracks in a noisy environment. We suggest a pitch error measure for the multipitch situation. The proposed algorithm is evaluated on a database of speech utterances mixed with various types of interference. Quantitative comparisons show that our algorithm significantly outperforms existing ones. Reverberation corrupts harmonic structure in voiced speech. We observe that the pitch strength of voiced speech segments is indicative of the degree of reverberation. Consequently, we present a pitch-based measure for reverberation time (T60) utilizing our new pitch determination algorithm. The pitch strength is measured by deriving the statistics of relative time lags, defined as the distances from the detected pitch periods to the closest peaks in correlograms. The monotonic relationship between the measured pitch strength and reverberation time is learned from a corpus of reverberant speech with known reverberation times. Under noise-free conditions, the quality of reverberant speech is dependent on two distinct perceptual components: coloration and long-term reverberation. They correspond to two physical variables: signal-to-reverberant energy ratio (SRR) and reverberation time, respectively. We propose a two-stage reverberant speech enhancement algorithm using one microphone. In the first stage, an inverse filter is estimated to reduce coloration effects so that SRR is increased. The second stage utilizes spectral subtraction to minimize the influence of long-term reverberation. The proposed algorithm significantly improves the quality of reverberant speech. Our algorithm is quantitatively compared with a recent one-microphone reverberant speech enhancement algorithm on a corpus of speech utterances in a number of reverberant conditions. The results show that our algorithm performs substantially better. 2003-11-07 English text The Ohio State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=osu1064341479 http://rave.ohiolink.edu/etdc/view?acc_num=osu1064341479 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws.
collection	NDLTD
language	English
sources	NDLTD
topic	Artificial Intelligence channel selection correlogram hidden Markov model (HMM) multipitch tracking noisy speech pitch detection pitch strength reverberant speech inverse filtering signal-to-reverberant energy ratio (SRR) reverberation reverberation time
spellingShingle	Artificial Intelligence channel selection correlogram hidden Markov model (HMM) multipitch tracking noisy speech pitch detection pitch strength reverberant speech inverse filtering signal-to-reverberant energy ratio (SRR) reverberation reverberation time Wu, Mingyang Pitch tracking and speech enhancement in noisy and reverberant environments
author	Wu, Mingyang
author_facet	Wu, Mingyang
author_sort	Wu, Mingyang
title	Pitch tracking and speech enhancement in noisy and reverberant environments
title_short	Pitch tracking and speech enhancement in noisy and reverberant environments
title_full	Pitch tracking and speech enhancement in noisy and reverberant environments
title_fullStr	Pitch tracking and speech enhancement in noisy and reverberant environments
title_full_unstemmed	Pitch tracking and speech enhancement in noisy and reverberant environments
title_sort	pitch tracking and speech enhancement in noisy and reverberant environments
publisher	The Ohio State University / OhioLINK
publishDate	2003
url	http://rave.ohiolink.edu/etdc/view?acc_num=osu1064341479
work_keys_str_mv	AT wumingyang pitchtrackingandspeechenhancementinnoisyandreverberantenvironments
_version_	1719425863048495104

Pitch tracking and speech enhancement in noisy and reverberant environments

Similar Items