Improvement of Speech Enhancement System Using Harmonic Adaptation

碩士 === 亞洲大學 === 資訊傳播學系 === 104 ===   A speech enhancement system is majorly composed of noise estimation and speech enhancement algorithms. The accuracy of noise estimation is important for the performance of the speech enhancement system. Most noise estimators suffer from either overestimation or u...

Full description

Bibliographic Details
Main Authors: LEI, CHUNG-LEN, 雷忠霖
Other Authors: LU, CHING-TA
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/dagaqj
Description
Summary:碩士 === 亞洲大學 === 資訊傳播學系 === 104 ===   A speech enhancement system is majorly composed of noise estimation and speech enhancement algorithms. The accuracy of noise estimation is important for the performance of the speech enhancement system. Most noise estimators suffer from either overestimation or underestimation on the noise level. An overestimate on noise magnitude will cause serious speech distortion for speech enhancement. Conversely, a great quantity of residual noise will be introduced when noise magnitude is underestimated. Accordingly, how to accurately estimate noise magnitude is important for speech enhancement. In this thesis, we employ a minima-controlled-recursive- averaging (MCRA) algorithm with variable segment length for the update of noise magnitude.   Initially, the fundament frequency is estimated to determine whether a frame is a vowel. In the case of a vowel frame, the segment length is increased to adequately underestimate the noise magnitude. Thus the speech distortion can be reduced in enhanced speech. On the contrary, the segment length will be rapidly decreased. This enables the noise estimate to be updated quickly, so the background noise can be efficiently removed by speech enhancement.   An over-subtraction factor can improve the performance of a spectral subtraction-based algorithm in noise removal. If the value of the factor is large enough, background noise can be efficiently removed; meanwhile enhanced speech suffering from serious speech distortion. On the contrary, plenty of residual noise exists when the value of over-subtraction factor is small, yielding enhanced speech sounding annoying to the human ear. How to define the value of this factor is critical to the quality of enhanced speech. We employ the harmonic properties of a vowel to define the value of over-subtraction factor by using the sigmoid function. This function maps the relation between the value of over-subtraction factor and the input SNRs. The transition slope of sigmoid function is small for a vowel region, enabling a weak vowel to be reserved. Conversely, the transition slope for consonant and noise-dominant regions is larger than that of a vowel region. Only the spectrum with high SNR can be reserved, so background noise can be efficiently removed.   Spectral subtraction algorithms are widely used in speech enhancement. However the quality of enhanced speech is not satisfied. This thesis proposes using the harmonic properties of vowels to define the values of over-subtraction and spectral reservation factors by using the sigmoid function. Experimental results show that the proposed method can significantly improve the performance of a spectral-subtraction method by more reduction on musical residual noise and more reservation on consonants and weak vowels. In order to improve the performance of the power-spectral-subtraction algorithm, we propose additionally subtracting the cross term between the spectrum of speech and noise signals from the power spectrum of noisy speech, enabling background noise to be efficiently removed. The weighting factor of the cross term is adapted by harmonic properties. So the enhanced speech quality sounds clearly.   Experimental results show that the proposed noise estimator can efficiently improve the performance of the MCRA algorithm by accurately estimating noise magnitude. Therefore, the performance of a speech enhancement system can be improved. In addition, the proposed speech enhancement methods can significantly improve the performance of the power-spectral-subtraction algorithm by more reduction on background noise and more reservation on vowel components.