Automatic Text Categorization on News

碩士 === 東吳大學 === 資訊科學系 === 90 === Nowadays, people are eager to get new information. People can’t easily and efficiently find out the wanted information among such huge data. So, we have to classify the documents and then users can efficiently search these documents in the category they belong. Tradi...

Full description

Bibliographic Details
Main Authors:	Ya-Fen Hsu, 許雅芬
Other Authors:	Sue J. Ker
Format:	Others
Language:	zh-TW
Published:	2002
Online Access:	http://ndltd.ncl.edu.tw/handle/91208800987400778267

id	ndltd-TW-090SCU00394001
record_format	oai_dc
spelling	ndltd-TW-090SCU003940012015-10-13T14:41:25Z http://ndltd.ncl.edu.tw/handle/91208800987400778267 Automatic Text Categorization on News 新聞文件自動分類之研究 Ya-Fen Hsu 許雅芬碩士東吳大學資訊科學系 90 Nowadays, people are eager to get new information. People can’t easily and efficiently find out the wanted information among such huge data. So, we have to classify the documents and then users can efficiently search these documents in the category they belong. Traditionally, by understanding the document experts assign specific categories to that document. However, it costs a lot of resources and has no economic benefits. So, we need an automatic text classifier to heap classification process. Automatic text categorization is the task of assigning predefined categories to free text documents. In text classification, there are always two important steps. The first step is features selection, and the second one is relevance function selection. Here we propose two techniques to improve the precision of classification by using co-occurrence terms and by considering the positions which bigram occurs. Moreover, this research also provides some other different features selection methods as the contrast for the experiment, including single terms features, bigram features, segmentation features and the position which segmentation occurs. The experimental result shows that the strategy which uses the co-occurrences as features did perform relatively well. Comparing with using pure bigram, there is about 15% improvement of the performance in average. Besides, the experiment also proves our observation of the texts, that is, bigram is more representative than single terms. In the next place, the positions of the key words have quite positive relation to importance. Sue J. Ker 柯淑津 2002 學位論文 ; thesis 56 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 東吳大學 === 資訊科學系 === 90 === Nowadays, people are eager to get new information. People can’t easily and efficiently find out the wanted information among such huge data. So, we have to classify the documents and then users can efficiently search these documents in the category they belong. Traditionally, by understanding the document experts assign specific categories to that document. However, it costs a lot of resources and has no economic benefits. So, we need an automatic text classifier to heap classification process. Automatic text categorization is the task of assigning predefined categories to free text documents. In text classification, there are always two important steps. The first step is features selection, and the second one is relevance function selection. Here we propose two techniques to improve the precision of classification by using co-occurrence terms and by considering the positions which bigram occurs. Moreover, this research also provides some other different features selection methods as the contrast for the experiment, including single terms features, bigram features, segmentation features and the position which segmentation occurs. The experimental result shows that the strategy which uses the co-occurrences as features did perform relatively well. Comparing with using pure bigram, there is about 15% improvement of the performance in average. Besides, the experiment also proves our observation of the texts, that is, bigram is more representative than single terms. In the next place, the positions of the key words have quite positive relation to importance.
author2	Sue J. Ker
author_facet	Sue J. Ker Ya-Fen Hsu 許雅芬
author	Ya-Fen Hsu 許雅芬
spellingShingle	Ya-Fen Hsu 許雅芬 Automatic Text Categorization on News
author_sort	Ya-Fen Hsu
title	Automatic Text Categorization on News
title_short	Automatic Text Categorization on News
title_full	Automatic Text Categorization on News
title_fullStr	Automatic Text Categorization on News
title_full_unstemmed	Automatic Text Categorization on News
title_sort	automatic text categorization on news
publishDate	2002
url	http://ndltd.ncl.edu.tw/handle/91208800987400778267
work_keys_str_mv	AT yafenhsu automatictextcategorizationonnews AT xǔyǎfēn automatictextcategorizationonnews AT yafenhsu xīnwénwénjiànzìdòngfēnlèizhīyánjiū AT xǔyǎfēn xīnwénwénjiànzìdòngfēnlèizhīyánjiū
_version_	1717756966931005440

Automatic Text Categorization on News

Similar Items