Summary: | 碩士 === 國立雲林科技大學 === 資訊管理系 === 103 === Keywords are a subset of words or phrases from a document those can describe the meaning of the document. The major methods for Chinese keyword extraction are keyword lexicons approaches, statistics approaches, linguistics approaches, etc. Among these methods, keyword lexicons approaches make keyword extraction high precision and high efficient, but building keyword lexicons spends a lot of time and the maintenance of keyword lexicons is manual.
This research presents a Chinese keyword extraction system based on CKIP Chinese word segmentation system. This system provides the recombination of words by using part of speech (POS) combination and automatic words combination via search engine (Google Search) and internet encyclopedia (Wikipedia). This system also focuses on building a keyword lexicon that can update its keywords automatically. The system can improve the disadvantages of keyword lexicons approaches. The results of experiments show that using the CKIP Chinese word segmentation system, POS combination and automatic words combination gains higher precision and the number of documents does not affect the performance of the keyword extraction system.
Keywords: Keyword Extraction, Keyword Lexicon, Search Engine, Internet Encyclopedia
|