POS-based Word Segmentation for Improving Mandarin Chinese TTS

碩士 === 國立清華大學 === 資訊系統與應用研究所 === 98 === This thesis proposes a POS-based (part of speech) word segmentation method for improving the speech quality produced by a Mandarin Chinese Text-To-Speech (TTS) system. POS information is adopted in word segmentation due to the following three reasons. First, c...

Full description

Bibliographic Details
Main Authors: Tang, Jo-Hua, 唐若華
Other Authors: Jang, Jyh-Shing Roger
Format: Others
Language:zh-TW
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/04252015308603040064
Description
Summary:碩士 === 國立清華大學 === 資訊系統與應用研究所 === 98 === This thesis proposes a POS-based (part of speech) word segmentation method for improving the speech quality produced by a Mandarin Chinese Text-To-Speech (TTS) system. POS information is adopted in word segmentation due to the following three reasons. First, collocation of POS's usually follows a certain syntactic rules. Second, every Mandarin character is only categorized as a certain set of POS's. The above two phenomena can solve the unseen word problem for word segmentation. The third reason is that the pronunciation of polyphonic characters usually depends on characters' POS's. In this thesis, POS information is incorporated with specialized hidden Markov models (Specialized HMM). In this approach, POS is used to extend the state symbols while the observation symbols represent Mandarin characters as before. Since the word segmentation described in this thesis is designed for a Mandarin Chinese TTS system, words are segmented differently from those standards used in information processing. Hence, according to some observed POS rules, certain words are combined as one single word before training. Experimental results show that adding POS information can effectively improve the segmentation accuracy. Another frequently seen problem is the segmentation ambiguity problem. In order to solve this problem, we combine POS-based specialized HMMs and maximum matching HMMs (M-HMM), called selective specialized HMMs, in order to acquire the benefits and compensate the weakness of these two methods towards the unseen word problem and segmentation ambiguity problem. Experimental results show that the selective specialized HMMs can further improve the segmentation accuracy against the POS-based specialized HMMs.