Bi-perceptron for Chinese Web News Categorization

碩士 === 國立臺灣大學 === 資訊工程學研究所 === 104 === Mobile news, due to its natural attributes of high frequency, has become a popular area pursued by many commercial companies in China. News categorization is an important technology in news automatic process. Many supervised learning methods can be applied in t...

Full description

Bibliographic Details
Main Authors: Jian Pan, 潘健
Other Authors: Cheng-Yuan Liou
Format: Others
Language:en_US
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/27135496119770819830
id ndltd-TW-104NTU05392061
record_format oai_dc
spelling ndltd-TW-104NTU053920612017-06-03T04:41:59Z http://ndltd.ncl.edu.tw/handle/27135496119770819830 Bi-perceptron for Chinese Web News Categorization Bi-perceptron 分類中文網頁新聞 Jian Pan 潘健 碩士 國立臺灣大學 資訊工程學研究所 104 Mobile news, due to its natural attributes of high frequency, has become a popular area pursued by many commercial companies in China. News categorization is an important technology in news automatic process. Many supervised learning methods can be applied in this area, where Support Vector Machine(SVM) achieves the state-of-art performance with discrete features. This paper provides the idea of bi-perceptron learning to solve the binary-class classification problem in the hope of achieving comparable or even better results than SVM. Bi-perceptron learning is a divide-and-conquer idea. We proposed this idea in this paper and realized a basic approach of it. We divided the classification problem into three steps: data partition, base classification and aggregation and compared different partition and aggregation methods. Moreover, we analyzed the effect of word segmentation methods, keywords number, the regularization of base classifiers and partition number on the categorization performance. Finally, we find an approach of bi-perceptron learning that is perfect in both time and memory consumption. Cheng-Yuan Liou 劉長遠 2016 學位論文 ; thesis 72 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 資訊工程學研究所 === 104 === Mobile news, due to its natural attributes of high frequency, has become a popular area pursued by many commercial companies in China. News categorization is an important technology in news automatic process. Many supervised learning methods can be applied in this area, where Support Vector Machine(SVM) achieves the state-of-art performance with discrete features. This paper provides the idea of bi-perceptron learning to solve the binary-class classification problem in the hope of achieving comparable or even better results than SVM. Bi-perceptron learning is a divide-and-conquer idea. We proposed this idea in this paper and realized a basic approach of it. We divided the classification problem into three steps: data partition, base classification and aggregation and compared different partition and aggregation methods. Moreover, we analyzed the effect of word segmentation methods, keywords number, the regularization of base classifiers and partition number on the categorization performance. Finally, we find an approach of bi-perceptron learning that is perfect in both time and memory consumption.
author2 Cheng-Yuan Liou
author_facet Cheng-Yuan Liou
Jian Pan
潘健
author Jian Pan
潘健
spellingShingle Jian Pan
潘健
Bi-perceptron for Chinese Web News Categorization
author_sort Jian Pan
title Bi-perceptron for Chinese Web News Categorization
title_short Bi-perceptron for Chinese Web News Categorization
title_full Bi-perceptron for Chinese Web News Categorization
title_fullStr Bi-perceptron for Chinese Web News Categorization
title_full_unstemmed Bi-perceptron for Chinese Web News Categorization
title_sort bi-perceptron for chinese web news categorization
publishDate 2016
url http://ndltd.ncl.edu.tw/handle/27135496119770819830
work_keys_str_mv AT jianpan biperceptronforchinesewebnewscategorization
AT pānjiàn biperceptronforchinesewebnewscategorization
AT jianpan biperceptronfēnlèizhōngwénwǎngyèxīnwén
AT pānjiàn biperceptronfēnlèizhōngwénwǎngyèxīnwén
_version_ 1718455043438411776