POS-based Word Segmentation for Improving Mandarin Chinese TTS

碩士 === 國立清華大學 === 資訊系統與應用研究所 === 98 === This thesis proposes a POS-based (part of speech) word segmentation method for improving the speech quality produced by a Mandarin Chinese Text-To-Speech (TTS) system. POS information is adopted in word segmentation due to the following three reasons. First, c...

Full description

Bibliographic Details
Main Authors:	Tang, Jo-Hua, 唐若華
Other Authors:	Jang, Jyh-Shing Roger
Format:	Others
Language:	zh-TW
Published:	2010
Online Access:	http://ndltd.ncl.edu.tw/handle/04252015308603040064

id	ndltd-TW-098NTHU5394035
record_format	oai_dc
spelling	ndltd-TW-098NTHU53940352016-04-20T04:17:28Z http://ndltd.ncl.edu.tw/handle/04252015308603040064 POS-based Word Segmentation for Improving Mandarin Chinese TTS 基於詞性之斷詞方法以改善華語語音合成系統 Tang, Jo-Hua 唐若華碩士國立清華大學資訊系統與應用研究所 98 This thesis proposes a POS-based (part of speech) word segmentation method for improving the speech quality produced by a Mandarin Chinese Text-To-Speech (TTS) system. POS information is adopted in word segmentation due to the following three reasons. First, collocation of POS's usually follows a certain syntactic rules. Second, every Mandarin character is only categorized as a certain set of POS's. The above two phenomena can solve the unseen word problem for word segmentation. The third reason is that the pronunciation of polyphonic characters usually depends on characters' POS's. In this thesis, POS information is incorporated with specialized hidden Markov models (Specialized HMM). In this approach, POS is used to extend the state symbols while the observation symbols represent Mandarin characters as before. Since the word segmentation described in this thesis is designed for a Mandarin Chinese TTS system, words are segmented differently from those standards used in information processing. Hence, according to some observed POS rules, certain words are combined as one single word before training. Experimental results show that adding POS information can effectively improve the segmentation accuracy. Another frequently seen problem is the segmentation ambiguity problem. In order to solve this problem, we combine POS-based specialized HMMs and maximum matching HMMs (M-HMM), called selective specialized HMMs, in order to acquire the benefits and compensate the weakness of these two methods towards the unseen word problem and segmentation ambiguity problem. Experimental results show that the selective specialized HMMs can further improve the segmentation accuracy against the POS-based specialized HMMs. Jang, Jyh-Shing Roger 張智星 2010 學位論文 ; thesis 40 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立清華大學 === 資訊系統與應用研究所 === 98 === This thesis proposes a POS-based (part of speech) word segmentation method for improving the speech quality produced by a Mandarin Chinese Text-To-Speech (TTS) system. POS information is adopted in word segmentation due to the following three reasons. First, collocation of POS's usually follows a certain syntactic rules. Second, every Mandarin character is only categorized as a certain set of POS's. The above two phenomena can solve the unseen word problem for word segmentation. The third reason is that the pronunciation of polyphonic characters usually depends on characters' POS's. In this thesis, POS information is incorporated with specialized hidden Markov models (Specialized HMM). In this approach, POS is used to extend the state symbols while the observation symbols represent Mandarin characters as before. Since the word segmentation described in this thesis is designed for a Mandarin Chinese TTS system, words are segmented differently from those standards used in information processing. Hence, according to some observed POS rules, certain words are combined as one single word before training. Experimental results show that adding POS information can effectively improve the segmentation accuracy. Another frequently seen problem is the segmentation ambiguity problem. In order to solve this problem, we combine POS-based specialized HMMs and maximum matching HMMs (M-HMM), called selective specialized HMMs, in order to acquire the benefits and compensate the weakness of these two methods towards the unseen word problem and segmentation ambiguity problem. Experimental results show that the selective specialized HMMs can further improve the segmentation accuracy against the POS-based specialized HMMs.
author2	Jang, Jyh-Shing Roger
author_facet	Jang, Jyh-Shing Roger Tang, Jo-Hua 唐若華
author	Tang, Jo-Hua 唐若華
spellingShingle	Tang, Jo-Hua 唐若華 POS-based Word Segmentation for Improving Mandarin Chinese TTS
author_sort	Tang, Jo-Hua
title	POS-based Word Segmentation for Improving Mandarin Chinese TTS
title_short	POS-based Word Segmentation for Improving Mandarin Chinese TTS
title_full	POS-based Word Segmentation for Improving Mandarin Chinese TTS
title_fullStr	POS-based Word Segmentation for Improving Mandarin Chinese TTS
title_full_unstemmed	POS-based Word Segmentation for Improving Mandarin Chinese TTS
title_sort	pos-based word segmentation for improving mandarin chinese tts
publishDate	2010
url	http://ndltd.ncl.edu.tw/handle/04252015308603040064
work_keys_str_mv	AT tangjohua posbasedwordsegmentationforimprovingmandarinchinesetts AT tángruòhuá posbasedwordsegmentationforimprovingmandarinchinesetts AT tangjohua jīyúcíxìngzhīduàncífāngfǎyǐgǎishànhuáyǔyǔyīnhéchéngxìtǒng AT tángruòhuá jīyúcíxìngzhīduàncífāngfǎyǐgǎishànhuáyǔyǔyīnhéchéngxìtǒng
_version_	1718227680188432384

POS-based Word Segmentation for Improving Mandarin Chinese TTS

Similar Items