The Research on the Voice Activity Detection and Speech Enhancement for Noisy Speech Recognition

碩士 === 國立暨南國際大學 === 電機工程學系 === 93 === When a speech recognizer is applied in a real environment, its performance is often degraded seriously due to the existence of additive noise. In order to improve the robustness of the recognition system under noisy conditions, various approaches have been propo...

Full description

Bibliographic Details
Main Authors: Chen-Wei Lai, 賴辰瑋
Other Authors: Jeih-Weih Hung
Format: Others
Language:zh-TW
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/06070640933840270072
id ndltd-TW-093NCNU0442070
record_format oai_dc
spelling ndltd-TW-093NCNU04420702015-10-13T11:39:19Z http://ndltd.ncl.edu.tw/handle/06070640933840270072 The Research on the Voice Activity Detection and Speech Enhancement for Noisy Speech Recognition 強健性語音辨認之研究:語音前端端點偵測與語音強化法 Chen-Wei Lai 賴辰瑋 碩士 國立暨南國際大學 電機工程學系 93 When a speech recognizer is applied in a real environment, its performance is often degraded seriously due to the existence of additive noise. In order to improve the robustness of the recognition system under noisy conditions, various approaches have been proposed, one direction of these approaches is attempt to detect the presence the presence of noise, to estimate the characteristics of the noise and then to remove or alleviate the noise in speech signals. In the thesis, we first study several voice activity detection (endpoint detection) approaches, which may detect the noise-only portions in a speech sequence. Then the noise statistics can be estimated via these noise portions. These approaches include order statistic filter (OSF), subband order statistic filter(SOSF), long-term spectrum divergence(LTSD), Kullback-Leibler distance(KL),energy and entropy, experimental results show that K-L distance method performs the best. That is, it gives the endpoints of noise-only portions closest to those obtained manually. Secondly, the speech enhancement approaches are studied, which try to reduce the noise component within the speech signal in different domains. For example, Nonlinear Spectral Subtraction(NSS) and Wiener Filter(WF) perform in linear spectral domain, Mel Spectral Subtraction(MSS) performs in mel spectral domain. Furthermore, we propose the Cepstral Statistics Compensation(CSC) method, which performs in cepstral domain, it is found that the effect of these back-end speech enhancement approaches in general depends on the accuracy of the front-end VAD, and CSC gives the optimal recognition rates among all approaches. CSCeven performs better than two popular temporal filtering approaches, Cepstaral mean subtraction(CMS) and Cepsral normalization(CN). In conclusion, robust VAD and speech enhancement approaches can effectively improve the noisy speech recognition, and have one special advantage. That is since they just perform on the speech to be recognized, it is no need to adjust the recognition models. Jeih-Weih Hung 洪志偉 2005 學位論文 ; thesis 84 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立暨南國際大學 === 電機工程學系 === 93 === When a speech recognizer is applied in a real environment, its performance is often degraded seriously due to the existence of additive noise. In order to improve the robustness of the recognition system under noisy conditions, various approaches have been proposed, one direction of these approaches is attempt to detect the presence the presence of noise, to estimate the characteristics of the noise and then to remove or alleviate the noise in speech signals. In the thesis, we first study several voice activity detection (endpoint detection) approaches, which may detect the noise-only portions in a speech sequence. Then the noise statistics can be estimated via these noise portions. These approaches include order statistic filter (OSF), subband order statistic filter(SOSF), long-term spectrum divergence(LTSD), Kullback-Leibler distance(KL),energy and entropy, experimental results show that K-L distance method performs the best. That is, it gives the endpoints of noise-only portions closest to those obtained manually. Secondly, the speech enhancement approaches are studied, which try to reduce the noise component within the speech signal in different domains. For example, Nonlinear Spectral Subtraction(NSS) and Wiener Filter(WF) perform in linear spectral domain, Mel Spectral Subtraction(MSS) performs in mel spectral domain. Furthermore, we propose the Cepstral Statistics Compensation(CSC) method, which performs in cepstral domain, it is found that the effect of these back-end speech enhancement approaches in general depends on the accuracy of the front-end VAD, and CSC gives the optimal recognition rates among all approaches. CSCeven performs better than two popular temporal filtering approaches, Cepstaral mean subtraction(CMS) and Cepsral normalization(CN). In conclusion, robust VAD and speech enhancement approaches can effectively improve the noisy speech recognition, and have one special advantage. That is since they just perform on the speech to be recognized, it is no need to adjust the recognition models.
author2 Jeih-Weih Hung
author_facet Jeih-Weih Hung
Chen-Wei Lai
賴辰瑋
author Chen-Wei Lai
賴辰瑋
spellingShingle Chen-Wei Lai
賴辰瑋
The Research on the Voice Activity Detection and Speech Enhancement for Noisy Speech Recognition
author_sort Chen-Wei Lai
title The Research on the Voice Activity Detection and Speech Enhancement for Noisy Speech Recognition
title_short The Research on the Voice Activity Detection and Speech Enhancement for Noisy Speech Recognition
title_full The Research on the Voice Activity Detection and Speech Enhancement for Noisy Speech Recognition
title_fullStr The Research on the Voice Activity Detection and Speech Enhancement for Noisy Speech Recognition
title_full_unstemmed The Research on the Voice Activity Detection and Speech Enhancement for Noisy Speech Recognition
title_sort research on the voice activity detection and speech enhancement for noisy speech recognition
publishDate 2005
url http://ndltd.ncl.edu.tw/handle/06070640933840270072
work_keys_str_mv AT chenweilai theresearchonthevoiceactivitydetectionandspeechenhancementfornoisyspeechrecognition
AT làichénwěi theresearchonthevoiceactivitydetectionandspeechenhancementfornoisyspeechrecognition
AT chenweilai qiángjiànxìngyǔyīnbiànrènzhīyánjiūyǔyīnqiánduānduāndiǎnzhēncèyǔyǔyīnqiánghuàfǎ
AT làichénwěi qiángjiànxìngyǔyīnbiànrènzhīyánjiūyǔyīnqiánduānduāndiǎnzhēncèyǔyǔyīnqiánghuàfǎ
AT chenweilai researchonthevoiceactivitydetectionandspeechenhancementfornoisyspeechrecognition
AT làichénwěi researchonthevoiceactivitydetectionandspeechenhancementfornoisyspeechrecognition
_version_ 1716846575825715200