A Study of Threshold Denoising on Modulation Spectrum for Robust Speech Recognition

碩士 === 國立暨南國際大學 === 電機工程學系 === 102 === This paper presents a novel noise robustness algorithm to enhance speech features in noisy speech recognition. In the presented algorithm, the temporal speech feature sequence is first converted to its spectrum via discrete cosine transform (DCT) or discrete Fo...

Full description

Bibliographic Details
Main Authors: Yen-Chih Cheng, 程彥誌
Other Authors: Gin-Der Wu
Format: Others
Language:zh-TW
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/54555410084305100242
id ndltd-TW-102NCNU0442075
record_format oai_dc
spelling ndltd-TW-102NCNU04420752015-10-13T23:38:01Z http://ndltd.ncl.edu.tw/handle/54555410084305100242 A Study of Threshold Denoising on Modulation Spectrum for Robust Speech Recognition 門檻值去噪法於調變頻譜之強健性語音辨識研究 Yen-Chih Cheng 程彥誌 碩士 國立暨南國際大學 電機工程學系 102 This paper presents a novel noise robustness algorithm to enhance speech features in noisy speech recognition. In the presented algorithm, the temporal speech feature sequence is first converted to its spectrum via discrete cosine transform (DCT) or discrete Fourier transform (DFT), and then the DCT or DFT-based spectrum is compensated by a thresholding function in order to further shrink the smaller portion. Finally, the updated spectrum is converted back to the temporal domain to obtain the new feature sequence. The method have two advantages: The first is that the overall compensation process is unsupervised that no information about noise in speech signals is required. The second is that the used threshold can be decided with various optimization criteria flexibly. The experiment evaluation performed on the Aurora-2 connected digit database and task reveals that the presented methods can provide significant improvement in recognition accuracy to the speech features pre-processed by any of the statistics normalization algorithms, including cepstral mean and variance normalization (CMVN), CMVN plus ARMA filtering (MVA), cepstral gain normalization (CGN) and histogram equalization (HEQ). The DFT-based thresholding methods achieve better performance than the DCT-based ones, but we further showed that, using the DCT-based methods, simply compensating the low frequency portion gives similar performance on a par with that achieved by compensation over the entire frequency band. As a result, both the DCT- and DFT-based compensation methods are quite effective in enhancing noise robustness of speech features. Gin-Der Wu Jeih-Weih Hung 吳俊德 洪志偉 2014 學位論文 ; thesis 75 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立暨南國際大學 === 電機工程學系 === 102 === This paper presents a novel noise robustness algorithm to enhance speech features in noisy speech recognition. In the presented algorithm, the temporal speech feature sequence is first converted to its spectrum via discrete cosine transform (DCT) or discrete Fourier transform (DFT), and then the DCT or DFT-based spectrum is compensated by a thresholding function in order to further shrink the smaller portion. Finally, the updated spectrum is converted back to the temporal domain to obtain the new feature sequence. The method have two advantages: The first is that the overall compensation process is unsupervised that no information about noise in speech signals is required. The second is that the used threshold can be decided with various optimization criteria flexibly. The experiment evaluation performed on the Aurora-2 connected digit database and task reveals that the presented methods can provide significant improvement in recognition accuracy to the speech features pre-processed by any of the statistics normalization algorithms, including cepstral mean and variance normalization (CMVN), CMVN plus ARMA filtering (MVA), cepstral gain normalization (CGN) and histogram equalization (HEQ). The DFT-based thresholding methods achieve better performance than the DCT-based ones, but we further showed that, using the DCT-based methods, simply compensating the low frequency portion gives similar performance on a par with that achieved by compensation over the entire frequency band. As a result, both the DCT- and DFT-based compensation methods are quite effective in enhancing noise robustness of speech features.
author2 Gin-Der Wu
author_facet Gin-Der Wu
Yen-Chih Cheng
程彥誌
author Yen-Chih Cheng
程彥誌
spellingShingle Yen-Chih Cheng
程彥誌
A Study of Threshold Denoising on Modulation Spectrum for Robust Speech Recognition
author_sort Yen-Chih Cheng
title A Study of Threshold Denoising on Modulation Spectrum for Robust Speech Recognition
title_short A Study of Threshold Denoising on Modulation Spectrum for Robust Speech Recognition
title_full A Study of Threshold Denoising on Modulation Spectrum for Robust Speech Recognition
title_fullStr A Study of Threshold Denoising on Modulation Spectrum for Robust Speech Recognition
title_full_unstemmed A Study of Threshold Denoising on Modulation Spectrum for Robust Speech Recognition
title_sort study of threshold denoising on modulation spectrum for robust speech recognition
publishDate 2014
url http://ndltd.ncl.edu.tw/handle/54555410084305100242
work_keys_str_mv AT yenchihcheng astudyofthresholddenoisingonmodulationspectrumforrobustspeechrecognition
AT chéngyànzhì astudyofthresholddenoisingonmodulationspectrumforrobustspeechrecognition
AT yenchihcheng ménkǎnzhíqùzàofǎyúdiàobiànpínpǔzhīqiángjiànxìngyǔyīnbiànshíyánjiū
AT chéngyànzhì ménkǎnzhíqùzàofǎyúdiàobiànpínpǔzhīqiángjiànxìngyǔyīnbiànshíyánjiū
AT yenchihcheng studyofthresholddenoisingonmodulationspectrumforrobustspeechrecognition
AT chéngyànzhì studyofthresholddenoisingonmodulationspectrumforrobustspeechrecognition
_version_ 1718086536995536896