Implementing Conference Information Retrieval System using Information Extraction Techniques

碩士 === 國立嘉義大學 === 資訊工程學系研究所 === 96 === As the World Wide Web grows continuously, it has become one of the major sources for people to obtain information. Information retrieval techniques can help the user to find related documents. However, because the retrieved results may contain a large number of...

Full description

Bibliographic Details
Main Authors: Chia-Lin Shih, 史嘉淋
Other Authors: Yaw-Huei Chen
Format: Others
Language:zh-TW
Online Access:http://ndltd.ncl.edu.tw/handle/35511659999374634954
Description
Summary:碩士 === 國立嘉義大學 === 資訊工程學系研究所 === 96 === As the World Wide Web grows continuously, it has become one of the major sources for people to obtain information. Information retrieval techniques can help the user to find related documents. However, because the retrieved results may contain a large number of documents, the user still needs to read many documents in order to find the useful information. On the other hand, extraction techniques use a set of extraction rules or a set of patterns to extract the needed information from documents and then integrate them into structured data, that can be easily used by the user. The main purpose of this research is to implement a conference information retrieval system using information extraction techniques. Because every website is built differently, it is difficult to use a program to automatically extract information from conference web pages. In this paper, rule based extraction techniques that use the relationship between data items, HTML structure, and VIPS (a Vision-based Page Segmentation Algorithm) algorithm are proposed to extract conference information such as conference name, topics, important dates, and location. The experimental data set contains conference websites form DBWorld, IEEE, and ACM, and search engines. The precision, recall, and F-Measure of the experiments are in the range of 0.80 to 0.91, which indicate that the proposed techniques can achieve good accuracy.