Automatic Web Document Classification Based on Genetic Algorithm

碩士 === 中華大學 === 資訊工程學系碩士班 === 92 === In this thesis we studied the web document classification method because of the web documents exponential growth day by day. We constructed a model based on Genetic Algorithm to choose the best threshold values of the condition parameters to sieve out suitable ke...

Full description

Bibliographic Details
Main Authors: Ya-Hui Chen, 陳雅慧
Other Authors: Prof. Chih-Hsun Chou
Format: Others
Language:zh-TW
Published: 2004
Online Access:http://ndltd.ncl.edu.tw/handle/68396692157353824068
Description
Summary:碩士 === 中華大學 === 資訊工程學系碩士班 === 92 === In this thesis we studied the web document classification method because of the web documents exponential growth day by day. We constructed a model based on Genetic Algorithm to choose the best threshold values of the condition parameters to sieve out suitable keywords in order to improve the “trial-and error” method used in other theses. In order to prove the threshold chosen values to be optimal, we applied both Vector Space Model (VSM) and Support Vector Machine Model (SVM) in order to classify the documents by those chosen keywords and analyze the classification results of "recall rate" and "precision rate". We also compare the classification results among single condition parameter and multiple conditions parameters. The major problem is the repulsion between "recall rate" and "precision rate" because both values are important to the users. According to the analysis of the classification results, we can find the threshold values of condition parameters derived by the GA model that can have the best classification results. Finally, this study uses the threshold values of four condition parameters to sieve out suitable keywords, and can increase both "recall rate" and "precision rate".