Automatic Text Categorization on News
碩士 === 東吳大學 === 資訊科學系 === 90 === Nowadays, people are eager to get new information. People can’t easily and efficiently find out the wanted information among such huge data. So, we have to classify the documents and then users can efficiently search these documents in the category they belong. Tradi...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2002
|
Online Access: | http://ndltd.ncl.edu.tw/handle/91208800987400778267 |
id |
ndltd-TW-090SCU00394001 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-090SCU003940012015-10-13T14:41:25Z http://ndltd.ncl.edu.tw/handle/91208800987400778267 Automatic Text Categorization on News 新聞文件自動分類之研究 Ya-Fen Hsu 許雅芬 碩士 東吳大學 資訊科學系 90 Nowadays, people are eager to get new information. People can’t easily and efficiently find out the wanted information among such huge data. So, we have to classify the documents and then users can efficiently search these documents in the category they belong. Traditionally, by understanding the document experts assign specific categories to that document. However, it costs a lot of resources and has no economic benefits. So, we need an automatic text classifier to heap classification process. Automatic text categorization is the task of assigning predefined categories to free text documents. In text classification, there are always two important steps. The first step is features selection, and the second one is relevance function selection. Here we propose two techniques to improve the precision of classification by using co-occurrence terms and by considering the positions which bigram occurs. Moreover, this research also provides some other different features selection methods as the contrast for the experiment, including single terms features, bigram features, segmentation features and the position which segmentation occurs. The experimental result shows that the strategy which uses the co-occurrences as features did perform relatively well. Comparing with using pure bigram, there is about 15% improvement of the performance in average. Besides, the experiment also proves our observation of the texts, that is, bigram is more representative than single terms. In the next place, the positions of the key words have quite positive relation to importance. Sue J. Ker 柯淑津 2002 學位論文 ; thesis 56 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 東吳大學 === 資訊科學系 === 90 === Nowadays, people are eager to get new information. People can’t easily and efficiently find out the wanted information among such huge data. So, we have to classify the documents and then users can efficiently search these documents in the category they belong. Traditionally, by understanding the document experts assign specific categories to that document. However, it costs a lot of resources and has no economic benefits. So, we need an automatic text classifier to heap classification process. Automatic text categorization is the task of assigning predefined categories to free text documents.
In text classification, there are always two important steps. The first step is features selection, and the second one is relevance function selection. Here we propose two techniques to improve the precision of classification by using co-occurrence terms and by considering the positions which bigram occurs. Moreover, this research also provides some other different features selection methods as the contrast for the experiment, including single terms features, bigram features, segmentation features and the position which segmentation occurs.
The experimental result shows that the strategy which uses the co-occurrences as features did perform relatively well. Comparing with using pure bigram, there is about 15% improvement of the performance in average. Besides, the experiment also proves our observation of the texts, that is, bigram is more representative than single terms. In the next place, the positions of the key words have quite positive relation to importance.
|
author2 |
Sue J. Ker |
author_facet |
Sue J. Ker Ya-Fen Hsu 許雅芬 |
author |
Ya-Fen Hsu 許雅芬 |
spellingShingle |
Ya-Fen Hsu 許雅芬 Automatic Text Categorization on News |
author_sort |
Ya-Fen Hsu |
title |
Automatic Text Categorization on News |
title_short |
Automatic Text Categorization on News |
title_full |
Automatic Text Categorization on News |
title_fullStr |
Automatic Text Categorization on News |
title_full_unstemmed |
Automatic Text Categorization on News |
title_sort |
automatic text categorization on news |
publishDate |
2002 |
url |
http://ndltd.ncl.edu.tw/handle/91208800987400778267 |
work_keys_str_mv |
AT yafenhsu automatictextcategorizationonnews AT xǔyǎfēn automatictextcategorizationonnews AT yafenhsu xīnwénwénjiànzìdòngfēnlèizhīyánjiū AT xǔyǎfēn xīnwénwénjiànzìdòngfēnlèizhīyánjiū |
_version_ |
1717756966931005440 |