Syllable Word Segmentation for Mandarin Chinese via Double Ranking of the Left and Right Context

博士 === 國立清華大學 === 資訊工程學系 === 100 === Syllable word segmentations as a part of Chinese phonetic input methods (CPIM) involve more overlapping boundaries than word segmentations because of homophone ambiguities. A CPIM usually assumes that the input is a complete sentence, and evaluates the performanc...

Full description

Bibliographic Details
Main Authors: Jiang, Tian-Jian, 姜天戩
Other Authors: 許聞廉
Format: Others
Language:en_US
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/70825467974900378132
id ndltd-TW-100NTHU5392070
record_format oai_dc
spelling ndltd-TW-100NTHU53920702015-10-13T21:27:24Z http://ndltd.ncl.edu.tw/handle/70825467974900378132 Syllable Word Segmentation for Mandarin Chinese via Double Ranking of the Left and Right Context Jiang, Tian-Jian 姜天戩 博士 國立清華大學 資訊工程學系 100 Syllable word segmentations as a part of Chinese phonetic input methods (CPIM) involve more overlapping boundaries than word segmentations because of homophone ambiguities. A CPIM usually assumes that the input is a complete sentence, and evaluates the performance based on a well-formed corpus. However, most Pinyin users prefer progressive text entry in short chunks, mainly in one or two words each, which is even more popular on handheld devices with limited computing power. Short chunks do not provide enough contexts to perform the best possible syllable-to-character conversion, especially when a chunk consists of overlapping boundaries. Those overlapping ambiguities show directional tendencies. This dissertation proposes a double ranking (DR) strategy on the left and right context. Experiments show that DR has the benefits of less memory with competitive performance compared to the frequency-based method (low memory and fast) and the conditional random fields model (larger memory and slower). 許聞廉 2012 學位論文 ; thesis 65 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立清華大學 === 資訊工程學系 === 100 === Syllable word segmentations as a part of Chinese phonetic input methods (CPIM) involve more overlapping boundaries than word segmentations because of homophone ambiguities. A CPIM usually assumes that the input is a complete sentence, and evaluates the performance based on a well-formed corpus. However, most Pinyin users prefer progressive text entry in short chunks, mainly in one or two words each, which is even more popular on handheld devices with limited computing power. Short chunks do not provide enough contexts to perform the best possible syllable-to-character conversion, especially when a chunk consists of overlapping boundaries. Those overlapping ambiguities show directional tendencies. This dissertation proposes a double ranking (DR) strategy on the left and right context. Experiments show that DR has the benefits of less memory with competitive performance compared to the frequency-based method (low memory and fast) and the conditional random fields model (larger memory and slower).
author2 許聞廉
author_facet 許聞廉
Jiang, Tian-Jian
姜天戩
author Jiang, Tian-Jian
姜天戩
spellingShingle Jiang, Tian-Jian
姜天戩
Syllable Word Segmentation for Mandarin Chinese via Double Ranking of the Left and Right Context
author_sort Jiang, Tian-Jian
title Syllable Word Segmentation for Mandarin Chinese via Double Ranking of the Left and Right Context
title_short Syllable Word Segmentation for Mandarin Chinese via Double Ranking of the Left and Right Context
title_full Syllable Word Segmentation for Mandarin Chinese via Double Ranking of the Left and Right Context
title_fullStr Syllable Word Segmentation for Mandarin Chinese via Double Ranking of the Left and Right Context
title_full_unstemmed Syllable Word Segmentation for Mandarin Chinese via Double Ranking of the Left and Right Context
title_sort syllable word segmentation for mandarin chinese via double ranking of the left and right context
publishDate 2012
url http://ndltd.ncl.edu.tw/handle/70825467974900378132
work_keys_str_mv AT jiangtianjian syllablewordsegmentationformandarinchineseviadoublerankingoftheleftandrightcontext
AT jiāngtiānjiǎn syllablewordsegmentationformandarinchineseviadoublerankingoftheleftandrightcontext
_version_ 1718062682809040896