AutoCrawler: An Integrated System for Automatic Topical Crawler

碩士 === 國立中正大學 === 資訊工程研究所 === 92 === At recent year the enormous growth of the world wide web , there are a lot of rich information in there. This make it important to perform resource discovery efficiently. In the thesis, we will probe the performance and...

Full description

Bibliographic Details
Main Authors: Chen-Yang Shin, 施晨揚
Other Authors: Jyh-Jong Tsay
Format: Others
Language:zh-TW
Published: 2004
Online Access:http://ndltd.ncl.edu.tw/handle/65860924358826186662
Description
Summary:碩士 === 國立中正大學 === 資訊工程研究所 === 92 === At recent year the enormous growth of the world wide web , there are a lot of rich information in there. This make it important to perform resource discovery efficiently. In the thesis, we will probe the performance and difference of some focused crawling algorithm. our AutoCrawler system can crawl particular topical portion for the World Wide Web without suffering all the web pages. It offer a graphic user interface that let user to specify her interest topic preciously and a visualize browse tool that it will cluster those page and show some metrics to user. We propose a ordering strategies that combine the page contents, URL token , keywords , anchor text, page title , anchor text and tunneling distance to enhance the precision of ordering the URLs in the frontier. We also provide key word specification base and hierarchy base specification to user. The experiment show that AutoCrawler is better than tradition context graph and bread-first search focused crawling system .