Searching Documents with Composite Chinese Word Segmentations

碩士 === 國立中央大學 === 企業管理研究所 === 98 === In natural language, “word” is the most basic element. Owing to an article is constituted by lots of words; we must separate those words apart first then go to research. Because there is no word-spacing in Chinese article, like the one as the boundary between eve...

Full description

Bibliographic Details
Main Authors: Chun-Wei Lin, 林俊偉
Other Authors: Ping-Yu Hsu
Format: Others
Language:zh-TW
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/26152106905264225206
id ndltd-TW-098NCU05457013
record_format oai_dc
spelling ndltd-TW-098NCU054570132015-10-13T13:43:19Z http://ndltd.ncl.edu.tw/handle/26152106905264225206 Searching Documents with Composite Chinese Word Segmentations 應用合併斷詞搜尋中文文件之研究 Chun-Wei Lin 林俊偉 碩士 國立中央大學 企業管理研究所 98 In natural language, “word” is the most basic element. Owing to an article is constituted by lots of words; we must separate those words apart first then go to research. Because there is no word-spacing in Chinese article, like the one as the boundary between every word in English article; therefore, we need to divide Chinese words through Word Segmentation from the very first beginning then go to further analysis. In Chinese Word Segmentation Process, CKIP in Academia Sinica is the most prominent. However, due to the design principle of CKIP has been constrained by the words only in the data, as for the words not included (named Unidentified Word) are hardly to sort out correct word segmentation result. The purpose of this research is to solve the problem of Unidentified Word: to raise the accuracy of Word Segmentation System for further Keyword, which is really able to be represented the whole article, Search Processing. The foundation of this research is the result from Chinese Word Segmentation System, and I tried to combine the result to piece up the complete lexicon and original meaning, compensating for the deficiency of undiscovered when Unidentified Word being separated. Afterward I snatched further Keyword through the original word segmentation result and combined lexicon. The results showed that precision and recall of composite Chinese word segmentation are much better than CKIP and Google, it also verify the validity of the system. Ping-Yu Hsu 許秉瑜 2010 學位論文 ; thesis 54 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立中央大學 === 企業管理研究所 === 98 === In natural language, “word” is the most basic element. Owing to an article is constituted by lots of words; we must separate those words apart first then go to research. Because there is no word-spacing in Chinese article, like the one as the boundary between every word in English article; therefore, we need to divide Chinese words through Word Segmentation from the very first beginning then go to further analysis. In Chinese Word Segmentation Process, CKIP in Academia Sinica is the most prominent. However, due to the design principle of CKIP has been constrained by the words only in the data, as for the words not included (named Unidentified Word) are hardly to sort out correct word segmentation result. The purpose of this research is to solve the problem of Unidentified Word: to raise the accuracy of Word Segmentation System for further Keyword, which is really able to be represented the whole article, Search Processing. The foundation of this research is the result from Chinese Word Segmentation System, and I tried to combine the result to piece up the complete lexicon and original meaning, compensating for the deficiency of undiscovered when Unidentified Word being separated. Afterward I snatched further Keyword through the original word segmentation result and combined lexicon. The results showed that precision and recall of composite Chinese word segmentation are much better than CKIP and Google, it also verify the validity of the system.
author2 Ping-Yu Hsu
author_facet Ping-Yu Hsu
Chun-Wei Lin
林俊偉
author Chun-Wei Lin
林俊偉
spellingShingle Chun-Wei Lin
林俊偉
Searching Documents with Composite Chinese Word Segmentations
author_sort Chun-Wei Lin
title Searching Documents with Composite Chinese Word Segmentations
title_short Searching Documents with Composite Chinese Word Segmentations
title_full Searching Documents with Composite Chinese Word Segmentations
title_fullStr Searching Documents with Composite Chinese Word Segmentations
title_full_unstemmed Searching Documents with Composite Chinese Word Segmentations
title_sort searching documents with composite chinese word segmentations
publishDate 2010
url http://ndltd.ncl.edu.tw/handle/26152106905264225206
work_keys_str_mv AT chunweilin searchingdocumentswithcompositechinesewordsegmentations
AT línjùnwěi searchingdocumentswithcompositechinesewordsegmentations
AT chunweilin yīngyònghébìngduàncísōuxúnzhōngwénwénjiànzhīyánjiū
AT línjùnwěi yīngyònghébìngduàncísōuxúnzhōngwénwénjiànzhīyánjiū
_version_ 1717740970082041856