Performance Evaluation and Improvement of Speaker Recognition under text-independent Environments

碩士 === 國立暨南國際大學 === 電機工程學系 === 93 === Abstract As we know, the well-deveploped speech recognition technologies make speech a more and more convenient interface for people to communicate with computers. Since a speech signal contains the information of the speaker that utters it, it can be used as a...

Full description

Bibliographic Details
Main Authors: Jia-Hung Wang, 王嘉鴻
Other Authors: Jeih-Weih Hung
Format: Others
Language:zh-TW
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/25793951359275028407
Description
Summary:碩士 === 國立暨南國際大學 === 電機工程學系 === 93 === Abstract As we know, the well-deveploped speech recognition technologies make speech a more and more convenient interface for people to communicate with computers. Since a speech signal contains the information of the speaker that utters it, it can be used as a feature to recognize the identity of a person. As a result, many applications, such as voice authentication and personal security are developed based on speaker recognition techniques. In this thesis, we are focused on the research of techniques for developing a robust speaker recognition system. Various approaches to construct a speaker recognition system and to improve its recognition accuracy and robustness are represented and discussed. First of all, we use Gaussian mixture models (GMM) to build the speaker models for our text-independent speaker recognition systems. By mean of the techniques of vector quantization and Gaussian density function training the voice characteristics of each speaker is obtained in the GMM’s, which can give accurate speaker recognition. Secondly, for the speaker verification system, we provide three different frameworks: (1) Dependent Background Gaussian Mixture Models (DB-GMM) (2) Independent Background Gaussian Mixture Models (IB-GMM) (3) No Background Gaussian Mixture Models (NB-GMM). These three frameworks are evaluated in our experiments in terms of their verification accuracy, speed, and application ranges. Thirdly, the influence of environmental mismatch on the speaker recognition is considered. Two categories of robustness approaches are applied here to reduce this mismatch. The first category is data-independent while the second is data-driven. The former includes (1) Vocal Tract Length Normalization (VTLN) (2) Relative Autocorrelation Sequence (RAS) (3) Modified Group Delay Function (MODGDF) (4) Parallel Model Combination (PMC) (5) Cepstral Mean Normalization (CMS) (6) Cepstral Normalization (CN), while the latter includes (1) Linear Discriminant Analysis (LDA) temporal filter (2) Principal Component Analysis (PCA) temporal filter (3) Minimum Classification Error (MCE) temporal filter (4) Linear Discriminant Analysis (LDA) spatial transformation (5) Principal Component Analysis (PCA) spatial transformation. All of these approaches are evaluated on our speaker recognition system under noisy conditions. Furthermore, we combine either two of these approaches to test if they are additive. Experimental results show that almost all of these robustness approaches reduce the effect of additive noise and thus improve the speaker recognition accuracy, and they can be used together to further enhance the robustness of the system.