The construction of document concept based on compound nouns

碩士 === 國立中央大學 === 資訊管理研究所 === 97 === With the growth of information technology, a large volume of digital documents and materials has appeared. Without information technology, searching of information would require a great human effort. To decrease the users’ effort, documents discrimination system...

Full description

Bibliographic Details
Main Authors: Ju-yuan Shih, 施儒淵
Other Authors: Shih-chieh Chou
Format: Others
Language:zh-TW
Published: 2009
Online Access:http://ndltd.ncl.edu.tw/handle/20810066825312429254
id ndltd-TW-097NCU05396066
record_format oai_dc
spelling ndltd-TW-097NCU053960662016-05-02T04:10:58Z http://ndltd.ncl.edu.tw/handle/20810066825312429254 The construction of document concept based on compound nouns 以複合名詞為基礎之文件概念建立方式 Ju-yuan Shih 施儒淵 碩士 國立中央大學 資訊管理研究所 97 With the growth of information technology, a large volume of digital documents and materials has appeared. Without information technology, searching of information would require a great human effort. To decrease the users’ effort, documents discrimination system has been developed and applied. In this kind of system, documents usually are discriminated by similarities automatically. In Information Retrieval, researches mainly use TF-IDF to present terms from documents, exploit those terms to form Vector Space Model, and then compute documents similarity based on the formed Vector Space Model. This approach could be improved. First, in addition to single terms, compound nouns are used in documents also. Second, different terms are used in the presentation of the same concept. This paper has proposed a method which forms the Vector Space Model with concepts that are exacted from documents. The steps include, first, extracting concept from terms and compound nouns of the documents, and second, building a Vector Space Model with these concepts as dimensions. Experimental results show that the approach of concept extraction outperforms TF-IDF in accuracy of document similarity computing. Shih-chieh Chou 周世傑 2009 學位論文 ; thesis 56 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立中央大學 === 資訊管理研究所 === 97 === With the growth of information technology, a large volume of digital documents and materials has appeared. Without information technology, searching of information would require a great human effort. To decrease the users’ effort, documents discrimination system has been developed and applied. In this kind of system, documents usually are discriminated by similarities automatically. In Information Retrieval, researches mainly use TF-IDF to present terms from documents, exploit those terms to form Vector Space Model, and then compute documents similarity based on the formed Vector Space Model. This approach could be improved. First, in addition to single terms, compound nouns are used in documents also. Second, different terms are used in the presentation of the same concept. This paper has proposed a method which forms the Vector Space Model with concepts that are exacted from documents. The steps include, first, extracting concept from terms and compound nouns of the documents, and second, building a Vector Space Model with these concepts as dimensions. Experimental results show that the approach of concept extraction outperforms TF-IDF in accuracy of document similarity computing.
author2 Shih-chieh Chou
author_facet Shih-chieh Chou
Ju-yuan Shih
施儒淵
author Ju-yuan Shih
施儒淵
spellingShingle Ju-yuan Shih
施儒淵
The construction of document concept based on compound nouns
author_sort Ju-yuan Shih
title The construction of document concept based on compound nouns
title_short The construction of document concept based on compound nouns
title_full The construction of document concept based on compound nouns
title_fullStr The construction of document concept based on compound nouns
title_full_unstemmed The construction of document concept based on compound nouns
title_sort construction of document concept based on compound nouns
publishDate 2009
url http://ndltd.ncl.edu.tw/handle/20810066825312429254
work_keys_str_mv AT juyuanshih theconstructionofdocumentconceptbasedoncompoundnouns
AT shīrúyuān theconstructionofdocumentconceptbasedoncompoundnouns
AT juyuanshih yǐfùhémíngcíwèijīchǔzhīwénjiàngàiniànjiànlìfāngshì
AT shīrúyuān yǐfùhémíngcíwèijīchǔzhīwénjiàngàiniànjiànlìfāngshì
AT juyuanshih constructionofdocumentconceptbasedoncompoundnouns
AT shīrúyuān constructionofdocumentconceptbasedoncompoundnouns
_version_ 1718253002956996608