Summary: | 碩士 === 國立中正大學 === 資訊工程研究所 === 92 === At recent year the enormous growth of the world wide web , there
are a lot of rich information in there. This make it important to
perform resource discovery efficiently. In the thesis, we will
probe the performance and difference of some focused crawling
algorithm. our AutoCrawler system can crawl particular topical
portion for the World Wide Web without suffering all the web
pages. It offer a graphic user interface that let user to specify her interest topic preciously and a visualize browse tool that it will cluster those page and show some metrics to user. We propose a ordering strategies that combine the page contents, URL token , keywords , anchor text, page title , anchor text and tunneling distance to enhance the precision of ordering the URLs in the frontier. We also provide key word specification base and hierarchy base specification to user. The experiment show that AutoCrawler is better than tradition context graph and bread-first search focused crawling system .
|