The construction of document concept based on compound nouns
碩士 === 國立中央大學 === 資訊管理研究所 === 97 === With the growth of information technology, a large volume of digital documents and materials has appeared. Without information technology, searching of information would require a great human effort. To decrease the users’ effort, documents discrimination system...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2009
|
Online Access: | http://ndltd.ncl.edu.tw/handle/20810066825312429254 |
id |
ndltd-TW-097NCU05396066 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-097NCU053960662016-05-02T04:10:58Z http://ndltd.ncl.edu.tw/handle/20810066825312429254 The construction of document concept based on compound nouns 以複合名詞為基礎之文件概念建立方式 Ju-yuan Shih 施儒淵 碩士 國立中央大學 資訊管理研究所 97 With the growth of information technology, a large volume of digital documents and materials has appeared. Without information technology, searching of information would require a great human effort. To decrease the users’ effort, documents discrimination system has been developed and applied. In this kind of system, documents usually are discriminated by similarities automatically. In Information Retrieval, researches mainly use TF-IDF to present terms from documents, exploit those terms to form Vector Space Model, and then compute documents similarity based on the formed Vector Space Model. This approach could be improved. First, in addition to single terms, compound nouns are used in documents also. Second, different terms are used in the presentation of the same concept. This paper has proposed a method which forms the Vector Space Model with concepts that are exacted from documents. The steps include, first, extracting concept from terms and compound nouns of the documents, and second, building a Vector Space Model with these concepts as dimensions. Experimental results show that the approach of concept extraction outperforms TF-IDF in accuracy of document similarity computing. Shih-chieh Chou 周世傑 2009 學位論文 ; thesis 56 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中央大學 === 資訊管理研究所 === 97 === With the growth of information technology, a large volume of digital documents and materials has appeared. Without information technology, searching of information would require a great human effort. To decrease the users’ effort, documents discrimination system has been developed and applied. In this kind of system, documents usually are discriminated by similarities automatically. In Information Retrieval, researches mainly use TF-IDF to present terms from documents, exploit those terms to form Vector Space Model, and then compute documents similarity based on the formed Vector Space Model. This approach could be improved. First, in addition to single terms, compound nouns are used in documents also. Second, different terms are used in the presentation of the same concept. This paper has proposed a method which forms the Vector Space Model with concepts that are exacted from documents. The steps include, first, extracting concept from terms and compound nouns of the documents, and second, building a Vector Space Model with these concepts as dimensions. Experimental results show that the approach of concept extraction outperforms TF-IDF in accuracy of document similarity computing.
|
author2 |
Shih-chieh Chou |
author_facet |
Shih-chieh Chou Ju-yuan Shih 施儒淵 |
author |
Ju-yuan Shih 施儒淵 |
spellingShingle |
Ju-yuan Shih 施儒淵 The construction of document concept based on compound nouns |
author_sort |
Ju-yuan Shih |
title |
The construction of document concept based on compound nouns |
title_short |
The construction of document concept based on compound nouns |
title_full |
The construction of document concept based on compound nouns |
title_fullStr |
The construction of document concept based on compound nouns |
title_full_unstemmed |
The construction of document concept based on compound nouns |
title_sort |
construction of document concept based on compound nouns |
publishDate |
2009 |
url |
http://ndltd.ncl.edu.tw/handle/20810066825312429254 |
work_keys_str_mv |
AT juyuanshih theconstructionofdocumentconceptbasedoncompoundnouns AT shīrúyuān theconstructionofdocumentconceptbasedoncompoundnouns AT juyuanshih yǐfùhémíngcíwèijīchǔzhīwénjiàngàiniànjiànlìfāngshì AT shīrúyuān yǐfùhémíngcíwèijīchǔzhīwénjiàngàiniànjiànlìfāngshì AT juyuanshih constructionofdocumentconceptbasedoncompoundnouns AT shīrúyuān constructionofdocumentconceptbasedoncompoundnouns |
_version_ |
1718253002956996608 |