Summary: | 碩士 === 輔仁大學 === 圖書資訊學系 === 98 === With the popularization of Internet services, the online resource from the Internet is more plentiful nowadays. ‘Date’ is one of the most important fields of metadata in web pages. Due to the special date displaying formats using in Taiwan, it has made the automatic cataloging on date for webpage more difficult. The major purpose of this research is to thoroughly analyze the different types of date displaying format using in Chinese webpage. These findings will be used to increase the precision on the date auto extraction of webpage.
The procedures of experiment are as follows. Firstly, the sample is randomly from Internet. Secondly, the statistic analysis on the date displaying format of each webpage is conducted. Lastly, Regular Expression is used to abstract the dates of each webpage and the accuracy ratio is calculated. The difficulties and feasibility of auto date extraction are discussed in the end of this work.
The results of the experiment suggest the accuracy ratio of web pages with date information is 61%. On the other hand, the accuracy ratio of web pages without date information is 62%. The average error of those web pages with date information is 0.62 year. The results of this research suggest that the auto date extraction mechanism can be used to improve the efficiency on webpage information retrieval.
|