Speaker Clustering Using Speaker-Dependent Phone Cluster Models and MSD-HMM

碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 95 === The drastic increase in recent years in the amount of spoken documents, such as broadcast news and meeting recordings, has led to the retrieval and management of spoken documents becoming more and more significant. Audio clustering is used to cluster an input...

Full description

Bibliographic Details
Main Authors:	Han-Ping Shen, 沈涵平
Other Authors:	Chung-Hsien Wu
Format:	Others
Language:	zh-TW
Published:	2007
Online Access:	http://ndltd.ncl.edu.tw/handle/56074676144881769313

id	ndltd-TW-095NCKU5392058
record_format	oai_dc
spelling	ndltd-TW-095NCKU53920582015-10-13T13:59:57Z http://ndltd.ncl.edu.tw/handle/56074676144881769313 Speaker Clustering Using Speaker-Dependent Phone Cluster Models and MSD-HMM 應用多空間機率模型及語者相關音素群組模型於語者聚類之研究 Han-Ping Shen 沈涵平碩士國立成功大學資訊工程學系碩博士班 95 The drastic increase in recent years in the amount of spoken documents, such as broadcast news and meeting recordings, has led to the retrieval and management of spoken documents becoming more and more significant. Audio clustering is used to cluster an input audio stream with similar fragments, such as speaker, foreground or background audio types. Speaker clustering can improve the performance of speech recognition and speaker identification. This paper presents an approach to speaker clustering. In the training phase, we build a phone cluster model to extract phonetic features – confusion phone information from different speakers, and we use speaker-dependent MSD-HMMs to model speaker prosody. In the testing phase, audio segmentation using an MDL-based method is performed first. Then speaker grouping based on acoustic features is adopted on the segmented speech fragments. A speech recognition system with unsupervised adaptation is applied. Finally, bottom-up agglomerative clustering is performed based on acoustic, phonetic and prosodic features. For the evaluation of the proposed method, the Mandarin Chinese Broadcast News Corpus (MATBN) is used as the spontaneous corpus. Experimental results reveal that the phone cluster model is useful to model the pronunciation confusion between different speakers, and MSD is useful to model MFCC and pitch simultaneously. And combining these two kinds of information can improve the performance of a speaker clustering system. Chung-Hsien Wu 吳宗憲 2007 學位論文 ; thesis 51 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 95 === The drastic increase in recent years in the amount of spoken documents, such as broadcast news and meeting recordings, has led to the retrieval and management of spoken documents becoming more and more significant. Audio clustering is used to cluster an input audio stream with similar fragments, such as speaker, foreground or background audio types. Speaker clustering can improve the performance of speech recognition and speaker identification. This paper presents an approach to speaker clustering. In the training phase, we build a phone cluster model to extract phonetic features – confusion phone information from different speakers, and we use speaker-dependent MSD-HMMs to model speaker prosody. In the testing phase, audio segmentation using an MDL-based method is performed first. Then speaker grouping based on acoustic features is adopted on the segmented speech fragments. A speech recognition system with unsupervised adaptation is applied. Finally, bottom-up agglomerative clustering is performed based on acoustic, phonetic and prosodic features. For the evaluation of the proposed method, the Mandarin Chinese Broadcast News Corpus (MATBN) is used as the spontaneous corpus. Experimental results reveal that the phone cluster model is useful to model the pronunciation confusion between different speakers, and MSD is useful to model MFCC and pitch simultaneously. And combining these two kinds of information can improve the performance of a speaker clustering system.
author2	Chung-Hsien Wu
author_facet	Chung-Hsien Wu Han-Ping Shen 沈涵平
author	Han-Ping Shen 沈涵平
spellingShingle	Han-Ping Shen 沈涵平 Speaker Clustering Using Speaker-Dependent Phone Cluster Models and MSD-HMM
author_sort	Han-Ping Shen
title	Speaker Clustering Using Speaker-Dependent Phone Cluster Models and MSD-HMM
title_short	Speaker Clustering Using Speaker-Dependent Phone Cluster Models and MSD-HMM
title_full	Speaker Clustering Using Speaker-Dependent Phone Cluster Models and MSD-HMM
title_fullStr	Speaker Clustering Using Speaker-Dependent Phone Cluster Models and MSD-HMM
title_full_unstemmed	Speaker Clustering Using Speaker-Dependent Phone Cluster Models and MSD-HMM
title_sort	speaker clustering using speaker-dependent phone cluster models and msd-hmm
publishDate	2007
url	http://ndltd.ncl.edu.tw/handle/56074676144881769313
work_keys_str_mv	AT hanpingshen speakerclusteringusingspeakerdependentphoneclustermodelsandmsdhmm AT chénhánpíng speakerclusteringusingspeakerdependentphoneclustermodelsandmsdhmm AT hanpingshen yīngyòngduōkōngjiānjīlǜmóxíngjíyǔzhěxiāngguānyīnsùqúnzǔmóxíngyúyǔzhějùlèizhīyánjiū AT chénhánpíng yīngyòngduōkōngjiānjīlǜmóxíngjíyǔzhěxiāngguānyīnsùqúnzǔmóxíngyúyǔzhějùlèizhīyánjiū
_version_	1717747367387594752

Speaker Clustering Using Speaker-Dependent Phone Cluster Models and MSD-HMM

Similar Items