An Application of Empirical Mode Decomposition Method to Speech Recognition in Noisy Environment

碩士 === 國立中央大學 === 電機工程研究所 === 94 === In this thesis, we study the Dr. Huang's Empirical Mode Decomposition method, EMD, which use yardstick change of time within signals to resolve signals into the combination of several Intrinsic Mode Functions, IMFs. IMFs contain different characteristics of...

Full description

Bibliographic Details
Main Authors: Wen-Jay Chen, 陳文杰
Other Authors: Yau-Tarng Juang
Format: Others
Language:zh-TW
Published: 2006
Online Access:http://ndltd.ncl.edu.tw/handle/352u43
Description
Summary:碩士 === 國立中央大學 === 電機工程研究所 === 94 === In this thesis, we study the Dr. Huang's Empirical Mode Decomposition method, EMD, which use yardstick change of time within signals to resolve signals into the combination of several Intrinsic Mode Functions, IMFs. IMFs contain different characteristics of signals and can express the physical characteristic in signals. We apply the information of the first IMF to the keyword spotting technique, and found that can improve recognition rate in different uniformly distributed SNRs of white noise environment. We apply EMD method to speech signals and make noise reduction procedure in the front-end processing according to qualitative and quantitative initially analysis of the first IMF. This method can improve the recognition rate in noisy conditions and get results like speech enhancement. In addition, we use the information of the first IMF to estimate SNR of a speech signal and switching system to the better acoustic model in recognition stage. Experimental results found that can improve recognition rate in low SNR environment. Finally, above-mentioned two kinds of methods are combined to improve the recognition rate systematically again. Results show we can estimate correctly test material in which SNR condition and switch system to the better acoustic model in recognition stage. By this way, we can switch correctly up to 97.95% and reach relative improvement 56.25% and 27.56% at SNR=0dB and SNR=10dB conditions respectively.