Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments
碩士 === 國立暨南國際大學 === 電機工程學系 === 95 === To improve the accuracy of a speech recognition system under a mismatched noisy environment has always been a major research issue in the speech processing area. A great amount of approaches have been proposed to reduce this environmental mismatch, and one class...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2007
|
Online Access: | http://ndltd.ncl.edu.tw/handle/77143721882774978160 |
id |
ndltd-TW-095NCNU0442028 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-095NCNU04420282015-10-13T16:45:24Z http://ndltd.ncl.edu.tw/handle/77143721882774978160 Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments 加成性雜訊環境下運用特徵參數統計補償法於強健性語音辨識 Tsung-Hsueh Hsieh 謝宗學 碩士 國立暨南國際大學 電機工程學系 95 To improve the accuracy of a speech recognition system under a mismatched noisy environment has always been a major research issue in the speech processing area. A great amount of approaches have been proposed to reduce this environmental mismatch, and one class of these approaches focuses on normalizing the statistics of speech features under different noise conditions. The well-known utterance-based cepstral mean and variance normalization (U-CMVN) and segmental cepstral mean and variance normalization (S-CMVN) both belong to this class. Both of them make use of the whole utterance or segments of an utterance to estimate the statistics, which may be not accurate enough, and they cannot be implemented in an on-line manner. In the thesis, instead of estimating the statistics in an utterance-wise manner as in U-CMVN and S-CMVN, we construct two set of codebooks, called pseudo stereo codebooks, which represent the speech features in clean and noisy environments, respectively. Then based on pseudo stereo codebooks, we develop three feature compensation approaches, i.e., cepstral statistics compensation (CSC), linear least squares (LLS) regression, and quadratic least squares (QLS) regression. These new approaches are simple yet very effective. Online implementation of them is achievable. We perform the proposed three approaches on four different types of cepstral features, including mel-frequency cepstral coefficients (MFCC), auto-correlation mel-frequency cepstral coefficients (AMFCC), linear prediction cepstral coefficients (LPCC) and perceptual linear prediction cepstral coefficients (PLPCC). Experiments conducted on the Aurora-2 database show that for each type of speech features, the proposed three approaches bring about very encouraging performance improvements under various noise environments. Besides, compared with the traditional utterance-based CMVN and segmental CMVN, the three approaches provide further improved recognition accuracy. Jeih-weih Hung 洪志偉 2007 學位論文 ; thesis 93 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立暨南國際大學 === 電機工程學系 === 95 === To improve the accuracy of a speech recognition system under a mismatched noisy environment has always been a major research issue in the speech processing area. A great amount of approaches have been proposed to reduce this environmental mismatch, and one class of these approaches focuses on normalizing the statistics of speech features under different noise conditions. The well-known utterance-based cepstral mean and variance normalization (U-CMVN) and segmental cepstral mean and variance normalization (S-CMVN) both belong to this class. Both of them make use of the whole utterance or segments of an utterance to estimate the statistics, which may be not accurate enough, and they cannot be implemented in an on-line manner.
In the thesis, instead of estimating the statistics in an utterance-wise manner as in U-CMVN and S-CMVN, we construct two set of codebooks, called pseudo stereo codebooks, which represent the speech features in clean and noisy environments, respectively. Then based on pseudo stereo codebooks, we develop three feature compensation approaches, i.e., cepstral statistics compensation (CSC), linear least squares (LLS) regression, and quadratic least squares (QLS) regression. These new approaches are simple yet very effective. Online implementation of them is achievable.
We perform the proposed three approaches on four different types of cepstral features, including mel-frequency cepstral coefficients (MFCC), auto-correlation mel-frequency cepstral coefficients (AMFCC), linear prediction cepstral coefficients (LPCC) and perceptual linear prediction cepstral coefficients (PLPCC). Experiments conducted on the Aurora-2 database show that for each type of speech features, the proposed three approaches bring about very encouraging performance improvements under various noise environments. Besides, compared with the traditional utterance-based CMVN and segmental CMVN, the three approaches provide further improved recognition accuracy.
|
author2 |
Jeih-weih Hung |
author_facet |
Jeih-weih Hung Tsung-Hsueh Hsieh 謝宗學 |
author |
Tsung-Hsueh Hsieh 謝宗學 |
spellingShingle |
Tsung-Hsueh Hsieh 謝宗學 Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments |
author_sort |
Tsung-Hsueh Hsieh |
title |
Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments |
title_short |
Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments |
title_full |
Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments |
title_fullStr |
Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments |
title_full_unstemmed |
Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments |
title_sort |
feature statistics compensation for robust speech recognition in additive noise environments |
publishDate |
2007 |
url |
http://ndltd.ncl.edu.tw/handle/77143721882774978160 |
work_keys_str_mv |
AT tsunghsuehhsieh featurestatisticscompensationforrobustspeechrecognitioninadditivenoiseenvironments AT xièzōngxué featurestatisticscompensationforrobustspeechrecognitioninadditivenoiseenvironments AT tsunghsuehhsieh jiāchéngxìngzáxùnhuánjìngxiàyùnyòngtèzhēngcānshùtǒngjìbǔchángfǎyúqiángjiànxìngyǔyīnbiànshí AT xièzōngxué jiāchéngxìngzáxùnhuánjìngxiàyùnyòngtèzhēngcānshùtǒngjìbǔchángfǎyúqiángjiànxìngyǔyīnbiànshí |
_version_ |
1717774507802886144 |