A Study of Threshold Denoising on Modulation Spectrum for Robust Speech Recognition
碩士 === 國立暨南國際大學 === 電機工程學系 === 102 === This paper presents a novel noise robustness algorithm to enhance speech features in noisy speech recognition. In the presented algorithm, the temporal speech feature sequence is first converted to its spectrum via discrete cosine transform (DCT) or discrete Fo...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2014
|
Online Access: | http://ndltd.ncl.edu.tw/handle/54555410084305100242 |
id |
ndltd-TW-102NCNU0442075 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-102NCNU04420752015-10-13T23:38:01Z http://ndltd.ncl.edu.tw/handle/54555410084305100242 A Study of Threshold Denoising on Modulation Spectrum for Robust Speech Recognition 門檻值去噪法於調變頻譜之強健性語音辨識研究 Yen-Chih Cheng 程彥誌 碩士 國立暨南國際大學 電機工程學系 102 This paper presents a novel noise robustness algorithm to enhance speech features in noisy speech recognition. In the presented algorithm, the temporal speech feature sequence is first converted to its spectrum via discrete cosine transform (DCT) or discrete Fourier transform (DFT), and then the DCT or DFT-based spectrum is compensated by a thresholding function in order to further shrink the smaller portion. Finally, the updated spectrum is converted back to the temporal domain to obtain the new feature sequence. The method have two advantages: The first is that the overall compensation process is unsupervised that no information about noise in speech signals is required. The second is that the used threshold can be decided with various optimization criteria flexibly. The experiment evaluation performed on the Aurora-2 connected digit database and task reveals that the presented methods can provide significant improvement in recognition accuracy to the speech features pre-processed by any of the statistics normalization algorithms, including cepstral mean and variance normalization (CMVN), CMVN plus ARMA filtering (MVA), cepstral gain normalization (CGN) and histogram equalization (HEQ). The DFT-based thresholding methods achieve better performance than the DCT-based ones, but we further showed that, using the DCT-based methods, simply compensating the low frequency portion gives similar performance on a par with that achieved by compensation over the entire frequency band. As a result, both the DCT- and DFT-based compensation methods are quite effective in enhancing noise robustness of speech features. Gin-Der Wu Jeih-Weih Hung 吳俊德 洪志偉 2014 學位論文 ; thesis 75 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立暨南國際大學 === 電機工程學系 === 102 === This paper presents a novel noise robustness algorithm to enhance speech features in
noisy speech recognition. In the presented algorithm, the temporal speech feature sequence
is first converted to its spectrum via discrete cosine transform (DCT) or discrete Fourier
transform (DFT), and then the DCT or DFT-based spectrum is compensated by a
thresholding function in order to further shrink the smaller portion. Finally, the updated
spectrum is converted back to the temporal domain to obtain the new feature sequence. The
method have two advantages: The first is that the overall compensation process is
unsupervised that no information about noise in speech signals is required. The second is
that the used threshold can be decided with various optimization criteria flexibly.
The experiment evaluation performed on the Aurora-2 connected digit database and
task reveals that the presented methods can provide significant improvement in recognition
accuracy to the speech features pre-processed by any of the statistics normalization
algorithms, including cepstral mean and variance normalization (CMVN), CMVN plus
ARMA filtering (MVA), cepstral gain normalization (CGN) and histogram equalization
(HEQ). The DFT-based thresholding methods achieve better performance than the
DCT-based ones, but we further showed that, using the DCT-based methods, simply
compensating the low frequency portion gives similar performance on a par with that
achieved by compensation over the entire frequency band. As a result, both the DCT- and
DFT-based compensation methods are quite effective in enhancing noise robustness of
speech features.
|
author2 |
Gin-Der Wu |
author_facet |
Gin-Der Wu Yen-Chih Cheng 程彥誌 |
author |
Yen-Chih Cheng 程彥誌 |
spellingShingle |
Yen-Chih Cheng 程彥誌 A Study of Threshold Denoising on Modulation Spectrum for Robust Speech Recognition |
author_sort |
Yen-Chih Cheng |
title |
A Study of Threshold Denoising on Modulation Spectrum for Robust Speech Recognition |
title_short |
A Study of Threshold Denoising on Modulation Spectrum for Robust Speech Recognition |
title_full |
A Study of Threshold Denoising on Modulation Spectrum for Robust Speech Recognition |
title_fullStr |
A Study of Threshold Denoising on Modulation Spectrum for Robust Speech Recognition |
title_full_unstemmed |
A Study of Threshold Denoising on Modulation Spectrum for Robust Speech Recognition |
title_sort |
study of threshold denoising on modulation spectrum for robust speech recognition |
publishDate |
2014 |
url |
http://ndltd.ncl.edu.tw/handle/54555410084305100242 |
work_keys_str_mv |
AT yenchihcheng astudyofthresholddenoisingonmodulationspectrumforrobustspeechrecognition AT chéngyànzhì astudyofthresholddenoisingonmodulationspectrumforrobustspeechrecognition AT yenchihcheng ménkǎnzhíqùzàofǎyúdiàobiànpínpǔzhīqiángjiànxìngyǔyīnbiànshíyánjiū AT chéngyànzhì ménkǎnzhíqùzàofǎyúdiàobiànpínpǔzhīqiángjiànxìngyǔyīnbiànshíyánjiū AT yenchihcheng studyofthresholddenoisingonmodulationspectrumforrobustspeechrecognition AT chéngyànzhì studyofthresholddenoisingonmodulationspectrumforrobustspeechrecognition |
_version_ |
1718086536995536896 |