Automatic Text Categorization on News

碩士 === 東吳大學 === 資訊科學系 === 90 === Nowadays, people are eager to get new information. People can’t easily and efficiently find out the wanted information among such huge data. So, we have to classify the documents and then users can efficiently search these documents in the category they belong. Tradi...

Full description

Bibliographic Details
Main Authors: Ya-Fen Hsu, 許雅芬
Other Authors: Sue J. Ker
Format: Others
Language:zh-TW
Published: 2002
Online Access:http://ndltd.ncl.edu.tw/handle/91208800987400778267
id ndltd-TW-090SCU00394001
record_format oai_dc
spelling ndltd-TW-090SCU003940012015-10-13T14:41:25Z http://ndltd.ncl.edu.tw/handle/91208800987400778267 Automatic Text Categorization on News 新聞文件自動分類之研究 Ya-Fen Hsu 許雅芬 碩士 東吳大學 資訊科學系 90 Nowadays, people are eager to get new information. People can’t easily and efficiently find out the wanted information among such huge data. So, we have to classify the documents and then users can efficiently search these documents in the category they belong. Traditionally, by understanding the document experts assign specific categories to that document. However, it costs a lot of resources and has no economic benefits. So, we need an automatic text classifier to heap classification process. Automatic text categorization is the task of assigning predefined categories to free text documents. In text classification, there are always two important steps. The first step is features selection, and the second one is relevance function selection. Here we propose two techniques to improve the precision of classification by using co-occurrence terms and by considering the positions which bigram occurs. Moreover, this research also provides some other different features selection methods as the contrast for the experiment, including single terms features, bigram features, segmentation features and the position which segmentation occurs. The experimental result shows that the strategy which uses the co-occurrences as features did perform relatively well. Comparing with using pure bigram, there is about 15% improvement of the performance in average. Besides, the experiment also proves our observation of the texts, that is, bigram is more representative than single terms. In the next place, the positions of the key words have quite positive relation to importance. Sue J. Ker 柯淑津 2002 學位論文 ; thesis 56 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 東吳大學 === 資訊科學系 === 90 === Nowadays, people are eager to get new information. People can’t easily and efficiently find out the wanted information among such huge data. So, we have to classify the documents and then users can efficiently search these documents in the category they belong. Traditionally, by understanding the document experts assign specific categories to that document. However, it costs a lot of resources and has no economic benefits. So, we need an automatic text classifier to heap classification process. Automatic text categorization is the task of assigning predefined categories to free text documents. In text classification, there are always two important steps. The first step is features selection, and the second one is relevance function selection. Here we propose two techniques to improve the precision of classification by using co-occurrence terms and by considering the positions which bigram occurs. Moreover, this research also provides some other different features selection methods as the contrast for the experiment, including single terms features, bigram features, segmentation features and the position which segmentation occurs. The experimental result shows that the strategy which uses the co-occurrences as features did perform relatively well. Comparing with using pure bigram, there is about 15% improvement of the performance in average. Besides, the experiment also proves our observation of the texts, that is, bigram is more representative than single terms. In the next place, the positions of the key words have quite positive relation to importance.
author2 Sue J. Ker
author_facet Sue J. Ker
Ya-Fen Hsu
許雅芬
author Ya-Fen Hsu
許雅芬
spellingShingle Ya-Fen Hsu
許雅芬
Automatic Text Categorization on News
author_sort Ya-Fen Hsu
title Automatic Text Categorization on News
title_short Automatic Text Categorization on News
title_full Automatic Text Categorization on News
title_fullStr Automatic Text Categorization on News
title_full_unstemmed Automatic Text Categorization on News
title_sort automatic text categorization on news
publishDate 2002
url http://ndltd.ncl.edu.tw/handle/91208800987400778267
work_keys_str_mv AT yafenhsu automatictextcategorizationonnews
AT xǔyǎfēn automatictextcategorizationonnews
AT yafenhsu xīnwénwénjiànzìdòngfēnlèizhīyánjiū
AT xǔyǎfēn xīnwénwénjiànzìdòngfēnlèizhīyánjiū
_version_ 1717756966931005440