POS-based Word Segmentation for Improving Mandarin Chinese TTS

碩士 === 國立清華大學 === 資訊系統與應用研究所 === 98 === This thesis proposes a POS-based (part of speech) word segmentation method for improving the speech quality produced by a Mandarin Chinese Text-To-Speech (TTS) system. POS information is adopted in word segmentation due to the following three reasons. First, c...

Full description

Bibliographic Details
Main Authors: Tang, Jo-Hua, 唐若華
Other Authors: Jang, Jyh-Shing Roger
Format: Others
Language:zh-TW
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/04252015308603040064
id ndltd-TW-098NTHU5394035
record_format oai_dc
spelling ndltd-TW-098NTHU53940352016-04-20T04:17:28Z http://ndltd.ncl.edu.tw/handle/04252015308603040064 POS-based Word Segmentation for Improving Mandarin Chinese TTS 基於詞性之斷詞方法以改善華語語音合成系統 Tang, Jo-Hua 唐若華 碩士 國立清華大學 資訊系統與應用研究所 98 This thesis proposes a POS-based (part of speech) word segmentation method for improving the speech quality produced by a Mandarin Chinese Text-To-Speech (TTS) system. POS information is adopted in word segmentation due to the following three reasons. First, collocation of POS's usually follows a certain syntactic rules. Second, every Mandarin character is only categorized as a certain set of POS's. The above two phenomena can solve the unseen word problem for word segmentation. The third reason is that the pronunciation of polyphonic characters usually depends on characters' POS's. In this thesis, POS information is incorporated with specialized hidden Markov models (Specialized HMM). In this approach, POS is used to extend the state symbols while the observation symbols represent Mandarin characters as before. Since the word segmentation described in this thesis is designed for a Mandarin Chinese TTS system, words are segmented differently from those standards used in information processing. Hence, according to some observed POS rules, certain words are combined as one single word before training. Experimental results show that adding POS information can effectively improve the segmentation accuracy. Another frequently seen problem is the segmentation ambiguity problem. In order to solve this problem, we combine POS-based specialized HMMs and maximum matching HMMs (M-HMM), called selective specialized HMMs, in order to acquire the benefits and compensate the weakness of these two methods towards the unseen word problem and segmentation ambiguity problem. Experimental results show that the selective specialized HMMs can further improve the segmentation accuracy against the POS-based specialized HMMs. Jang, Jyh-Shing Roger 張智星 2010 學位論文 ; thesis 40 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立清華大學 === 資訊系統與應用研究所 === 98 === This thesis proposes a POS-based (part of speech) word segmentation method for improving the speech quality produced by a Mandarin Chinese Text-To-Speech (TTS) system. POS information is adopted in word segmentation due to the following three reasons. First, collocation of POS's usually follows a certain syntactic rules. Second, every Mandarin character is only categorized as a certain set of POS's. The above two phenomena can solve the unseen word problem for word segmentation. The third reason is that the pronunciation of polyphonic characters usually depends on characters' POS's. In this thesis, POS information is incorporated with specialized hidden Markov models (Specialized HMM). In this approach, POS is used to extend the state symbols while the observation symbols represent Mandarin characters as before. Since the word segmentation described in this thesis is designed for a Mandarin Chinese TTS system, words are segmented differently from those standards used in information processing. Hence, according to some observed POS rules, certain words are combined as one single word before training. Experimental results show that adding POS information can effectively improve the segmentation accuracy. Another frequently seen problem is the segmentation ambiguity problem. In order to solve this problem, we combine POS-based specialized HMMs and maximum matching HMMs (M-HMM), called selective specialized HMMs, in order to acquire the benefits and compensate the weakness of these two methods towards the unseen word problem and segmentation ambiguity problem. Experimental results show that the selective specialized HMMs can further improve the segmentation accuracy against the POS-based specialized HMMs.
author2 Jang, Jyh-Shing Roger
author_facet Jang, Jyh-Shing Roger
Tang, Jo-Hua
唐若華
author Tang, Jo-Hua
唐若華
spellingShingle Tang, Jo-Hua
唐若華
POS-based Word Segmentation for Improving Mandarin Chinese TTS
author_sort Tang, Jo-Hua
title POS-based Word Segmentation for Improving Mandarin Chinese TTS
title_short POS-based Word Segmentation for Improving Mandarin Chinese TTS
title_full POS-based Word Segmentation for Improving Mandarin Chinese TTS
title_fullStr POS-based Word Segmentation for Improving Mandarin Chinese TTS
title_full_unstemmed POS-based Word Segmentation for Improving Mandarin Chinese TTS
title_sort pos-based word segmentation for improving mandarin chinese tts
publishDate 2010
url http://ndltd.ncl.edu.tw/handle/04252015308603040064
work_keys_str_mv AT tangjohua posbasedwordsegmentationforimprovingmandarinchinesetts
AT tángruòhuá posbasedwordsegmentationforimprovingmandarinchinesetts
AT tangjohua jīyúcíxìngzhīduàncífāngfǎyǐgǎishànhuáyǔyǔyīnhéchéngxìtǒng
AT tángruòhuá jīyúcíxìngzhīduàncífāngfǎyǐgǎishànhuáyǔyǔyīnhéchéngxìtǒng
_version_ 1718227680188432384