Bi-perceptron for Chinese Web News Categorization

碩士 === 國立臺灣大學 === 資訊工程學研究所 === 104 === Mobile news, due to its natural attributes of high frequency, has become a popular area pursued by many commercial companies in China. News categorization is an important technology in news automatic process. Many supervised learning methods can be applied in t...

Full description

Bibliographic Details
Main Authors: Jian Pan, 潘健
Other Authors: Cheng-Yuan Liou
Format: Others
Language:en_US
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/27135496119770819830
Description
Summary:碩士 === 國立臺灣大學 === 資訊工程學研究所 === 104 === Mobile news, due to its natural attributes of high frequency, has become a popular area pursued by many commercial companies in China. News categorization is an important technology in news automatic process. Many supervised learning methods can be applied in this area, where Support Vector Machine(SVM) achieves the state-of-art performance with discrete features. This paper provides the idea of bi-perceptron learning to solve the binary-class classification problem in the hope of achieving comparable or even better results than SVM. Bi-perceptron learning is a divide-and-conquer idea. We proposed this idea in this paper and realized a basic approach of it. We divided the classification problem into three steps: data partition, base classification and aggregation and compared different partition and aggregation methods. Moreover, we analyzed the effect of word segmentation methods, keywords number, the regularization of base classifiers and partition number on the categorization performance. Finally, we find an approach of bi-perceptron learning that is perfect in both time and memory consumption.