Robust Hierarchical Linear Model Comparison forUtterance-End Detection Under Noisy Environments

碩士 === 元智大學 === 通訊工程學系 === 99 === To detect speech-block end-points, we use entropy of the speech signal, and compare the t of two weighted linear models for the entropy. The regression models are constructed so that their t will di er the most near the speech-block end-points. For estimation of s...

Full description

Bibliographic Details
Main Authors: Ming-Chaing Hsu, 徐明江
Other Authors: Wei-Tyng Hong
Format: Others
Language:en_US
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/55990173986262954692
id ndltd-TW-099YZU05650043
record_format oai_dc
spelling ndltd-TW-099YZU056500432016-04-13T04:17:17Z http://ndltd.ncl.edu.tw/handle/55990173986262954692 Robust Hierarchical Linear Model Comparison forUtterance-End Detection Under Noisy Environments 基於階層式線性模型比較與熵參數之語音區塊端點偵測 Ming-Chaing Hsu 徐明江 碩士 元智大學 通訊工程學系 99 To detect speech-block end-points, we use entropy of the speech signal, and compare the t of two weighted linear models for the entropy. The regression models are constructed so that their t will di er the most near the speech-block end-points. For estimation of speech signal entropy we use a histogram of speech signal that is sampled in a frame of xed duration. The resulting sequence of entropies in consecutive frames is used for tting the linear models. The models are tted in a sliding interval of the entropies, that correspond to several consecutive frames in a sliding time window. The interval with the greatest di erence of the model t is used to estimate the location of the speech-block boundaries. Model M1 is very simple; it corresponds to a constant average entropy level for the speech signal in the entire window. Model M2 models step-like entropy change from one constant level to another, with a gradual transition between the levels. It is a piecewise linear regression model with two horizontal lines connected by a third transitional line. We treat M1 as a linear model only to be able to describe it as a sub-model of M2 and use statistical methodology for sub-model testing. The performance of the presented algorithm is compared with a standard EOU (end-of-utterance) detection. The average rate of proper detection is 92.36% for our approach. It gets almost 13% improvement over the EOU method. Several tables are presented to illustrate that, while obtaining similar result for clean sound, our regression model approach has better ability to resist the negative e ect of noise. Wei-Tyng Hong 洪維廷 2011 學位論文 ; thesis 72 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 元智大學 === 通訊工程學系 === 99 === To detect speech-block end-points, we use entropy of the speech signal, and compare the t of two weighted linear models for the entropy. The regression models are constructed so that their t will di er the most near the speech-block end-points. For estimation of speech signal entropy we use a histogram of speech signal that is sampled in a frame of xed duration. The resulting sequence of entropies in consecutive frames is used for tting the linear models. The models are tted in a sliding interval of the entropies, that correspond to several consecutive frames in a sliding time window. The interval with the greatest di erence of the model t is used to estimate the location of the speech-block boundaries. Model M1 is very simple; it corresponds to a constant average entropy level for the speech signal in the entire window. Model M2 models step-like entropy change from one constant level to another, with a gradual transition between the levels. It is a piecewise linear regression model with two horizontal lines connected by a third transitional line. We treat M1 as a linear model only to be able to describe it as a sub-model of M2 and use statistical methodology for sub-model testing. The performance of the presented algorithm is compared with a standard EOU (end-of-utterance) detection. The average rate of proper detection is 92.36% for our approach. It gets almost 13% improvement over the EOU method. Several tables are presented to illustrate that, while obtaining similar result for clean sound, our regression model approach has better ability to resist the negative e ect of noise.
author2 Wei-Tyng Hong
author_facet Wei-Tyng Hong
Ming-Chaing Hsu
徐明江
author Ming-Chaing Hsu
徐明江
spellingShingle Ming-Chaing Hsu
徐明江
Robust Hierarchical Linear Model Comparison forUtterance-End Detection Under Noisy Environments
author_sort Ming-Chaing Hsu
title Robust Hierarchical Linear Model Comparison forUtterance-End Detection Under Noisy Environments
title_short Robust Hierarchical Linear Model Comparison forUtterance-End Detection Under Noisy Environments
title_full Robust Hierarchical Linear Model Comparison forUtterance-End Detection Under Noisy Environments
title_fullStr Robust Hierarchical Linear Model Comparison forUtterance-End Detection Under Noisy Environments
title_full_unstemmed Robust Hierarchical Linear Model Comparison forUtterance-End Detection Under Noisy Environments
title_sort robust hierarchical linear model comparison forutterance-end detection under noisy environments
publishDate 2011
url http://ndltd.ncl.edu.tw/handle/55990173986262954692
work_keys_str_mv AT mingchainghsu robusthierarchicallinearmodelcomparisonforutteranceenddetectionundernoisyenvironments
AT xúmíngjiāng robusthierarchicallinearmodelcomparisonforutteranceenddetectionundernoisyenvironments
AT mingchainghsu jīyújiēcéngshìxiànxìngmóxíngbǐjiàoyǔshāngcānshùzhīyǔyīnqūkuàiduāndiǎnzhēncè
AT xúmíngjiāng jīyújiēcéngshìxiànxìngmóxíngbǐjiàoyǔshāngcānshùzhīyǔyīnqūkuàiduāndiǎnzhēncè
_version_ 1718222775151230976