Speaker Clustering Using Speaker-Dependent Phone Cluster Models and MSD-HMM

碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 95 === The drastic increase in recent years in the amount of spoken documents, such as broadcast news and meeting recordings, has led to the retrieval and management of spoken documents becoming more and more significant. Audio clustering is used to cluster an input...

Full description

Bibliographic Details
Main Authors: Han-Ping Shen, 沈涵平
Other Authors: Chung-Hsien Wu
Format: Others
Language:zh-TW
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/56074676144881769313
id ndltd-TW-095NCKU5392058
record_format oai_dc
spelling ndltd-TW-095NCKU53920582015-10-13T13:59:57Z http://ndltd.ncl.edu.tw/handle/56074676144881769313 Speaker Clustering Using Speaker-Dependent Phone Cluster Models and MSD-HMM 應用多空間機率模型及語者相關音素群組模型於語者聚類之研究 Han-Ping Shen 沈涵平 碩士 國立成功大學 資訊工程學系碩博士班 95 The drastic increase in recent years in the amount of spoken documents, such as broadcast news and meeting recordings, has led to the retrieval and management of spoken documents becoming more and more significant. Audio clustering is used to cluster an input audio stream with similar fragments, such as speaker, foreground or background audio types. Speaker clustering can improve the performance of speech recognition and speaker identification. This paper presents an approach to speaker clustering. In the training phase, we build a phone cluster model to extract phonetic features – confusion phone information from different speakers, and we use speaker-dependent MSD-HMMs to model speaker prosody. In the testing phase, audio segmentation using an MDL-based method is performed first. Then speaker grouping based on acoustic features is adopted on the segmented speech fragments. A speech recognition system with unsupervised adaptation is applied. Finally, bottom-up agglomerative clustering is performed based on acoustic, phonetic and prosodic features. For the evaluation of the proposed method, the Mandarin Chinese Broadcast News Corpus (MATBN) is used as the spontaneous corpus. Experimental results reveal that the phone cluster model is useful to model the pronunciation confusion between different speakers, and MSD is useful to model MFCC and pitch simultaneously. And combining these two kinds of information can improve the performance of a speaker clustering system. Chung-Hsien Wu 吳宗憲 2007 學位論文 ; thesis 51 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 95 === The drastic increase in recent years in the amount of spoken documents, such as broadcast news and meeting recordings, has led to the retrieval and management of spoken documents becoming more and more significant. Audio clustering is used to cluster an input audio stream with similar fragments, such as speaker, foreground or background audio types. Speaker clustering can improve the performance of speech recognition and speaker identification. This paper presents an approach to speaker clustering. In the training phase, we build a phone cluster model to extract phonetic features – confusion phone information from different speakers, and we use speaker-dependent MSD-HMMs to model speaker prosody. In the testing phase, audio segmentation using an MDL-based method is performed first. Then speaker grouping based on acoustic features is adopted on the segmented speech fragments. A speech recognition system with unsupervised adaptation is applied. Finally, bottom-up agglomerative clustering is performed based on acoustic, phonetic and prosodic features. For the evaluation of the proposed method, the Mandarin Chinese Broadcast News Corpus (MATBN) is used as the spontaneous corpus. Experimental results reveal that the phone cluster model is useful to model the pronunciation confusion between different speakers, and MSD is useful to model MFCC and pitch simultaneously. And combining these two kinds of information can improve the performance of a speaker clustering system.
author2 Chung-Hsien Wu
author_facet Chung-Hsien Wu
Han-Ping Shen
沈涵平
author Han-Ping Shen
沈涵平
spellingShingle Han-Ping Shen
沈涵平
Speaker Clustering Using Speaker-Dependent Phone Cluster Models and MSD-HMM
author_sort Han-Ping Shen
title Speaker Clustering Using Speaker-Dependent Phone Cluster Models and MSD-HMM
title_short Speaker Clustering Using Speaker-Dependent Phone Cluster Models and MSD-HMM
title_full Speaker Clustering Using Speaker-Dependent Phone Cluster Models and MSD-HMM
title_fullStr Speaker Clustering Using Speaker-Dependent Phone Cluster Models and MSD-HMM
title_full_unstemmed Speaker Clustering Using Speaker-Dependent Phone Cluster Models and MSD-HMM
title_sort speaker clustering using speaker-dependent phone cluster models and msd-hmm
publishDate 2007
url http://ndltd.ncl.edu.tw/handle/56074676144881769313
work_keys_str_mv AT hanpingshen speakerclusteringusingspeakerdependentphoneclustermodelsandmsdhmm
AT chénhánpíng speakerclusteringusingspeakerdependentphoneclustermodelsandmsdhmm
AT hanpingshen yīngyòngduōkōngjiānjīlǜmóxíngjíyǔzhěxiāngguānyīnsùqúnzǔmóxíngyúyǔzhějùlèizhīyánjiū
AT chénhánpíng yīngyòngduōkōngjiānjīlǜmóxíngjíyǔzhěxiāngguānyīnsùqúnzǔmóxíngyúyǔzhějùlèizhīyánjiū
_version_ 1717747367387594752