Summary: | 碩士 === 元智大學 === 資訊工程學系 === 90 === A procedure is studied for the purpose of query efficiency improvement for text data, to save the time and cost for who eager for information. Nowadays, information is no longer limited by area due to the blooming usage of Internet. Information is propagated widely via Internet in the format of voice, picture and text. Compare to other format, text data is the major usage to carry on the cable communication in human society. However, the concept of the description of using text is lacks in precision compare to the traditional database which use the “tuple” to record the data precisely. In order to have the efficient query during information search in text data, this study propose a methodology named concept indexing different to the traditional skill of text indexing which usually take times to re-screen the information during query.
Concept category is the core of concept indexing. All the keyword will be transferred from the term space to concept space, and the document similarity will be then calculated in the concept space using the theory of Euclidean distance in vector space. This usage of vector space will bring the function of the relation, contraction and dilation between concepts. Base on the category of concept, the study also involve the idea of visual text mining to address the subject in the concept, trying to help the information buyer to get the useful target in the first time of query.
The Internet news was used to implement the system; different kind of text data source can be adapted to the system since the methodology is proposed. The experiment results of this study show that the concept indexing can adjust the ability of recall and precision according to the requirement of information buyer. And the subject of concept for both information buyer and query system can be matched by the using of concept indexing skill.
|