Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments

碩士 === 國立暨南國際大學 === 電機工程學系 === 95 === To improve the accuracy of a speech recognition system under a mismatched noisy environment has always been a major research issue in the speech processing area. A great amount of approaches have been proposed to reduce this environmental mismatch, and one class...

Full description

Bibliographic Details
Main Authors:	Tsung-Hsueh Hsieh, 謝宗學
Other Authors:	Jeih-weih Hung
Format:	Others
Language:	zh-TW
Published:	2007
Online Access:	http://ndltd.ncl.edu.tw/handle/77143721882774978160

id	ndltd-TW-095NCNU0442028
record_format	oai_dc
spelling	ndltd-TW-095NCNU04420282015-10-13T16:45:24Z http://ndltd.ncl.edu.tw/handle/77143721882774978160 Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments 加成性雜訊環境下運用特徵參數統計補償法於強健性語音辨識 Tsung-Hsueh Hsieh 謝宗學碩士國立暨南國際大學電機工程學系 95 To improve the accuracy of a speech recognition system under a mismatched noisy environment has always been a major research issue in the speech processing area. A great amount of approaches have been proposed to reduce this environmental mismatch, and one class of these approaches focuses on normalizing the statistics of speech features under different noise conditions. The well-known utterance-based cepstral mean and variance normalization (U-CMVN) and segmental cepstral mean and variance normalization (S-CMVN) both belong to this class. Both of them make use of the whole utterance or segments of an utterance to estimate the statistics, which may be not accurate enough, and they cannot be implemented in an on-line manner. In the thesis, instead of estimating the statistics in an utterance-wise manner as in U-CMVN and S-CMVN, we construct two set of codebooks, called pseudo stereo codebooks, which represent the speech features in clean and noisy environments, respectively. Then based on pseudo stereo codebooks, we develop three feature compensation approaches, i.e., cepstral statistics compensation (CSC), linear least squares (LLS) regression, and quadratic least squares (QLS) regression. These new approaches are simple yet very effective. Online implementation of them is achievable. We perform the proposed three approaches on four different types of cepstral features, including mel-frequency cepstral coefficients (MFCC), auto-correlation mel-frequency cepstral coefficients (AMFCC), linear prediction cepstral coefficients (LPCC) and perceptual linear prediction cepstral coefficients (PLPCC). Experiments conducted on the Aurora-2 database show that for each type of speech features, the proposed three approaches bring about very encouraging performance improvements under various noise environments. Besides, compared with the traditional utterance-based CMVN and segmental CMVN, the three approaches provide further improved recognition accuracy. Jeih-weih Hung 洪志偉 2007 學位論文 ; thesis 93 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立暨南國際大學 === 電機工程學系 === 95 === To improve the accuracy of a speech recognition system under a mismatched noisy environment has always been a major research issue in the speech processing area. A great amount of approaches have been proposed to reduce this environmental mismatch, and one class of these approaches focuses on normalizing the statistics of speech features under different noise conditions. The well-known utterance-based cepstral mean and variance normalization (U-CMVN) and segmental cepstral mean and variance normalization (S-CMVN) both belong to this class. Both of them make use of the whole utterance or segments of an utterance to estimate the statistics, which may be not accurate enough, and they cannot be implemented in an on-line manner. In the thesis, instead of estimating the statistics in an utterance-wise manner as in U-CMVN and S-CMVN, we construct two set of codebooks, called pseudo stereo codebooks, which represent the speech features in clean and noisy environments, respectively. Then based on pseudo stereo codebooks, we develop three feature compensation approaches, i.e., cepstral statistics compensation (CSC), linear least squares (LLS) regression, and quadratic least squares (QLS) regression. These new approaches are simple yet very effective. Online implementation of them is achievable. We perform the proposed three approaches on four different types of cepstral features, including mel-frequency cepstral coefficients (MFCC), auto-correlation mel-frequency cepstral coefficients (AMFCC), linear prediction cepstral coefficients (LPCC) and perceptual linear prediction cepstral coefficients (PLPCC). Experiments conducted on the Aurora-2 database show that for each type of speech features, the proposed three approaches bring about very encouraging performance improvements under various noise environments. Besides, compared with the traditional utterance-based CMVN and segmental CMVN, the three approaches provide further improved recognition accuracy.
author2	Jeih-weih Hung
author_facet	Jeih-weih Hung Tsung-Hsueh Hsieh 謝宗學
author	Tsung-Hsueh Hsieh 謝宗學
spellingShingle	Tsung-Hsueh Hsieh 謝宗學 Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments
author_sort	Tsung-Hsueh Hsieh
title	Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments
title_short	Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments
title_full	Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments
title_fullStr	Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments
title_full_unstemmed	Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments
title_sort	feature statistics compensation for robust speech recognition in additive noise environments
publishDate	2007
url	http://ndltd.ncl.edu.tw/handle/77143721882774978160
work_keys_str_mv	AT tsunghsuehhsieh featurestatisticscompensationforrobustspeechrecognitioninadditivenoiseenvironments AT xièzōngxué featurestatisticscompensationforrobustspeechrecognitioninadditivenoiseenvironments AT tsunghsuehhsieh jiāchéngxìngzáxùnhuánjìngxiàyùnyòngtèzhēngcānshùtǒngjìbǔchángfǎyúqiángjiànxìngyǔyīnbiànshí AT xièzōngxué jiāchéngxìngzáxùnhuánjìngxiàyùnyòngtèzhēngcānshùtǒngjìbǔchángfǎyúqiángjiànxìngyǔyīnbiànshí
_version_	1717774507802886144

Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments

Similar Items