Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments

碩士 === 國立暨南國際大學 === 電機工程學系 === 95 === To improve the accuracy of a speech recognition system under a mismatched noisy environment has always been a major research issue in the speech processing area. A great amount of approaches have been proposed to reduce this environmental mismatch, and one class...

Full description

Bibliographic Details
Main Authors: Tsung-Hsueh Hsieh, 謝宗學
Other Authors: Jeih-weih Hung
Format: Others
Language:zh-TW
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/77143721882774978160
id ndltd-TW-095NCNU0442028
record_format oai_dc
spelling ndltd-TW-095NCNU04420282015-10-13T16:45:24Z http://ndltd.ncl.edu.tw/handle/77143721882774978160 Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments 加成性雜訊環境下運用特徵參數統計補償法於強健性語音辨識 Tsung-Hsueh Hsieh 謝宗學 碩士 國立暨南國際大學 電機工程學系 95 To improve the accuracy of a speech recognition system under a mismatched noisy environment has always been a major research issue in the speech processing area. A great amount of approaches have been proposed to reduce this environmental mismatch, and one class of these approaches focuses on normalizing the statistics of speech features under different noise conditions. The well-known utterance-based cepstral mean and variance normalization (U-CMVN) and segmental cepstral mean and variance normalization (S-CMVN) both belong to this class. Both of them make use of the whole utterance or segments of an utterance to estimate the statistics, which may be not accurate enough, and they cannot be implemented in an on-line manner. In the thesis, instead of estimating the statistics in an utterance-wise manner as in U-CMVN and S-CMVN, we construct two set of codebooks, called pseudo stereo codebooks, which represent the speech features in clean and noisy environments, respectively. Then based on pseudo stereo codebooks, we develop three feature compensation approaches, i.e., cepstral statistics compensation (CSC), linear least squares (LLS) regression, and quadratic least squares (QLS) regression. These new approaches are simple yet very effective. Online implementation of them is achievable. We perform the proposed three approaches on four different types of cepstral features, including mel-frequency cepstral coefficients (MFCC), auto-correlation mel-frequency cepstral coefficients (AMFCC), linear prediction cepstral coefficients (LPCC) and perceptual linear prediction cepstral coefficients (PLPCC). Experiments conducted on the Aurora-2 database show that for each type of speech features, the proposed three approaches bring about very encouraging performance improvements under various noise environments. Besides, compared with the traditional utterance-based CMVN and segmental CMVN, the three approaches provide further improved recognition accuracy. Jeih-weih Hung 洪志偉 2007 學位論文 ; thesis 93 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立暨南國際大學 === 電機工程學系 === 95 === To improve the accuracy of a speech recognition system under a mismatched noisy environment has always been a major research issue in the speech processing area. A great amount of approaches have been proposed to reduce this environmental mismatch, and one class of these approaches focuses on normalizing the statistics of speech features under different noise conditions. The well-known utterance-based cepstral mean and variance normalization (U-CMVN) and segmental cepstral mean and variance normalization (S-CMVN) both belong to this class. Both of them make use of the whole utterance or segments of an utterance to estimate the statistics, which may be not accurate enough, and they cannot be implemented in an on-line manner. In the thesis, instead of estimating the statistics in an utterance-wise manner as in U-CMVN and S-CMVN, we construct two set of codebooks, called pseudo stereo codebooks, which represent the speech features in clean and noisy environments, respectively. Then based on pseudo stereo codebooks, we develop three feature compensation approaches, i.e., cepstral statistics compensation (CSC), linear least squares (LLS) regression, and quadratic least squares (QLS) regression. These new approaches are simple yet very effective. Online implementation of them is achievable. We perform the proposed three approaches on four different types of cepstral features, including mel-frequency cepstral coefficients (MFCC), auto-correlation mel-frequency cepstral coefficients (AMFCC), linear prediction cepstral coefficients (LPCC) and perceptual linear prediction cepstral coefficients (PLPCC). Experiments conducted on the Aurora-2 database show that for each type of speech features, the proposed three approaches bring about very encouraging performance improvements under various noise environments. Besides, compared with the traditional utterance-based CMVN and segmental CMVN, the three approaches provide further improved recognition accuracy.
author2 Jeih-weih Hung
author_facet Jeih-weih Hung
Tsung-Hsueh Hsieh
謝宗學
author Tsung-Hsueh Hsieh
謝宗學
spellingShingle Tsung-Hsueh Hsieh
謝宗學
Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments
author_sort Tsung-Hsueh Hsieh
title Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments
title_short Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments
title_full Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments
title_fullStr Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments
title_full_unstemmed Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments
title_sort feature statistics compensation for robust speech recognition in additive noise environments
publishDate 2007
url http://ndltd.ncl.edu.tw/handle/77143721882774978160
work_keys_str_mv AT tsunghsuehhsieh featurestatisticscompensationforrobustspeechrecognitioninadditivenoiseenvironments
AT xièzōngxué featurestatisticscompensationforrobustspeechrecognitioninadditivenoiseenvironments
AT tsunghsuehhsieh jiāchéngxìngzáxùnhuánjìngxiàyùnyòngtèzhēngcānshùtǒngjìbǔchángfǎyúqiángjiànxìngyǔyīnbiànshí
AT xièzōngxué jiāchéngxìngzáxùnhuánjìngxiàyùnyòngtèzhēngcānshùtǒngjìbǔchángfǎyúqiángjiànxìngyǔyīnbiànshí
_version_ 1717774507802886144