Model-based Non-Intrusive Objective Speech Quality Measurement Using Perceptual Parameters

碩士 === 國立交通大學 === 電信工程系所 === 96 === Assessing speech quality is an important issue in modern communication systems. The subjective speech quality measurements in early days involve much human resource and money such that the need of an objective speech quality measurement emerges. In addition, origi...

Full description

Bibliographic Details
Main Authors: Shang-Ju Yu, 余尚儒
Other Authors: Tai-Shih Chi
Format: Others
Language:zh-TW
Published: 2008
Online Access:http://ndltd.ncl.edu.tw/handle/70275781211050224354
Description
Summary:碩士 === 國立交通大學 === 電信工程系所 === 96 === Assessing speech quality is an important issue in modern communication systems. The subjective speech quality measurements in early days involve much human resource and money such that the need of an objective speech quality measurement emerges. In addition, original speech signals are not always available when measuring speech quality in practical world. Many non-intrusive methods, which do not require original signals in judging the speech quality, are newly developed to meet this criterion. Such non-intrusive methods do not cost much human resource while being used for the real-time quality test with great efficiency. The main theme of this work is to extract perceptual parameters from an auditory model, which mimics the signal processing principles in the human auditory pathway, and build an objective speech quality measurement without reference signal. First, we propose a voice activity detector (VAD) algorithm by using the perceptual parameters from the auditory model. This VAD algorithm detects three basic categories in speech signals: voice, unvoice and inactive. Next, we acquire the auditory cepstral coefficients (ACC) to be the non-intrusive quality judging parameter. A Gaussian Mixture Model (GMM) is used to build the statistical template of the clean signal to represent the absent reference signal. When measuring the quality of speech from different channels and codecs, the VAD is first utilized to distinguish distorted speech into three categories. Then, ACC parameters are extracted and compared to the statistical templates of the clean speech. The log-probability density function (log-pdf) is used to represent the distance between clean and degraded speech signals. Finally, a regression function is used to map the overall distances from those three categories to the subjective quality scores. The correlation between our objective measures and the subjective measures are examined to validate our approach.