Summary: | 碩士 === 國立暨南國際大學 === 電機工程學系 === 101 === In this thesis, we present a novel approach to enhancing the speech features in the modulation
spectrum for better recognition performance in noise-corrupted environments. In the presented
approach, termed modulation spectrum power-law expansion (MSPLE), the speech feature
temporal stream is first pre-processed by some statistics compensation technique, such as cepstral
mean and variance normalization (CMVN), cepstral gain normalization (CGN) and cepstral
histogram normalization (CHN), and then the magnitude part of the modulation spectrum (Fourier
transform) for the feature stream is raised to a power (exponentiated). We find that MSPLE can
highlight the speech components and reduce the noise distortion existing in the
statistics-compensated speech features. With the Aurora-2 digit database and task, experimental
results reveal that the above process can consistently achieve very promising recognition accuracy
under a wide range of noise-corrupted environments. MSPLE operated on MVN-preprocessed
features brings about 45% in error rate reduction relative to the MFCC baseline and significantly
outperforms the single MVN. Furthermore, performing MSPLE on the low-half sub-band
modulation spectra gives the results very close to those from the full-band modulation spectra
updated by MSPLE, indicating that a less-complicated MSPLE suffices to produce noise-robust
speech features.
|