Speaker Clustering Using Speaker-Dependent Phone Cluster Models and MSD-HMM

碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 95 === The drastic increase in recent years in the amount of spoken documents, such as broadcast news and meeting recordings, has led to the retrieval and management of spoken documents becoming more and more significant. Audio clustering is used to cluster an input...

Full description

Bibliographic Details
Main Authors: Han-Ping Shen, 沈涵平
Other Authors: Chung-Hsien Wu
Format: Others
Language:zh-TW
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/56074676144881769313
Description
Summary:碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 95 === The drastic increase in recent years in the amount of spoken documents, such as broadcast news and meeting recordings, has led to the retrieval and management of spoken documents becoming more and more significant. Audio clustering is used to cluster an input audio stream with similar fragments, such as speaker, foreground or background audio types. Speaker clustering can improve the performance of speech recognition and speaker identification. This paper presents an approach to speaker clustering. In the training phase, we build a phone cluster model to extract phonetic features – confusion phone information from different speakers, and we use speaker-dependent MSD-HMMs to model speaker prosody. In the testing phase, audio segmentation using an MDL-based method is performed first. Then speaker grouping based on acoustic features is adopted on the segmented speech fragments. A speech recognition system with unsupervised adaptation is applied. Finally, bottom-up agglomerative clustering is performed based on acoustic, phonetic and prosodic features. For the evaluation of the proposed method, the Mandarin Chinese Broadcast News Corpus (MATBN) is used as the spontaneous corpus. Experimental results reveal that the phone cluster model is useful to model the pronunciation confusion between different speakers, and MSD is useful to model MFCC and pitch simultaneously. And combining these two kinds of information can improve the performance of a speaker clustering system.