Disfluency Correction of Spontaneous Speech using Conditional Random Fields with Variable Length Features

碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 94 === Recently, the speech recognition technologies are close to maturity. However, edit difluency in spontaneous speech should be considered as an important issue for practical application. Most of researches on edit disfluency either focus on specific edit disflue...

Full description

Bibliographic Details
Main Authors: Wei-Yen Wu, 吳維彥
Other Authors: Chung-Hsien Wu
Format: Others
Language:zh-TW
Published: 2006
Online Access:http://ndltd.ncl.edu.tw/handle/50827960864536693806
Description
Summary:碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 94 === Recently, the speech recognition technologies are close to maturity. However, edit difluency in spontaneous speech should be considered as an important issue for practical application. Most of researches on edit disfluency either focus on specific edit disfluecy type or does not integrate multiple knowledge sources jointly. Therefore, central to this issue is how to detect and correct the three categories of edit disfluency with multiple knowledge sources. In this thesis we propose a conditional random fields with variable length model to detect and correct edit disfluency, which is composed of state transition function and observation function. The observation feature functions consist of context related, disfluency related and pattern related features. Three variable-length units, word, chunk and sentence are employed as states of state transition feature functions. Chunk is extracted by Apriori algorithm according to words co-occurrence and term frequency. Sentence is identified according to the verb with corresponding necessary arguments. Finally, the improved iterative scaling (IIS) algorithm is adopted for estimating the weights. For the evaluation of the proposed method, Mandarin conversational dialogue corpus (MCDC) is used as the spontaneous corpus. The detect error rate of edit word is 17.3%. Compared with DF-GRAM, Maximum Entropy and the approach combining language model and alignment model, the proposed approach achieved 11.7%, 8% and 3.9% improvements, respectively. The experimental results show that the proposed model outperforms other methods and efficiently detects and corrects edit disfluency in spontaneous speech.