The Study of Auto Extraction of Dates from Chinese Web Pages in Taiwan Area

碩士 === 輔仁大學 === 圖書資訊學系 === 98 === With the popularization of Internet services, the online resource from the Internet is more plentiful nowadays. ‘Date’ is one of the most important fields of metadata in web pages. Due to the special date displaying formats using in Taiwan, it has made the automatic...

Full description

Bibliographic Details
Main Authors: Wen-Hui Tai, 邰文暉
Other Authors: Cheng-Juei Wu
Format: Others
Language:zh-TW
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/68609583336853599211
Description
Summary:碩士 === 輔仁大學 === 圖書資訊學系 === 98 === With the popularization of Internet services, the online resource from the Internet is more plentiful nowadays. ‘Date’ is one of the most important fields of metadata in web pages. Due to the special date displaying formats using in Taiwan, it has made the automatic cataloging on date for webpage more difficult. The major purpose of this research is to thoroughly analyze the different types of date displaying format using in Chinese webpage. These findings will be used to increase the precision on the date auto extraction of webpage. The procedures of experiment are as follows. Firstly, the sample is randomly from Internet. Secondly, the statistic analysis on the date displaying format of each webpage is conducted. Lastly, Regular Expression is used to abstract the dates of each webpage and the accuracy ratio is calculated. The difficulties and feasibility of auto date extraction are discussed in the end of this work. The results of the experiment suggest the accuracy ratio of web pages with date information is 61%. On the other hand, the accuracy ratio of web pages without date information is 62%. The average error of those web pages with date information is 0.62 year. The results of this research suggest that the auto date extraction mechanism can be used to improve the efficiency on webpage information retrieval.