Summary: | 碩士 === 中國文化大學 === 資訊管理研究所 === 93 === A website contains a lot of terms which are distributed in each webpage of the website. Some of these terms describe the characteristics of the website and can used to classify the website to a specific category. The others have no relationship to the website is ignored while performing the classification task.
In the research, a Chinese website hierarchical classification inference system is developed. In the system, useful terms are extracted using the pij and  ̄pij value. The knowledge base is then applied by the MMB approach to inference the individual probabilities of categories which the website belongs to. The category with the highest probability is selected by the system to the designated classification category. Moreover, the system can be used through web services which can support cross platforms.
In the system, there are three major modules. The first is the knowledge con-struction module. The module uses the web mining method to explore the web-page’s hyperlink structure and sentences and properly cut the sentences with CKIP segmentation unit into different terms. All non-noun terms are eliminated. The am-biguity terms are removed from noun terms and all synonyms are grouped. The re-sult term set is used to calculate pij and  ̄pij values to construct the inference knowl-edge base. The second is the inference engine module. It uses the MMB approach along with the inference knowledge base to infer the website’s classification probabilities. The third is knowledge learning module which provides a self-learning mechanism to update the inference knowledge base.
|