A Study of Using Data Mining Techniques for Text Classification
碩士 === 南台科技大學 === 資訊管理系 === 93 === In the wake of rapid information upgrade in the current society, to boost competitive edge, we must be able to present, apply, and receive data in an effective, organized, and accurate approach. This study aims at improving quality on text classification via inform...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2005
|
Online Access: | http://ndltd.ncl.edu.tw/handle/48144129555740914501 |
id |
ndltd-TW-093STUT0396012 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-093STUT03960122016-11-22T04:12:22Z http://ndltd.ncl.edu.tw/handle/48144129555740914501 A Study of Using Data Mining Techniques for Text Classification 利用資料探勘技術於文件分類之研究 Jen wen yan 晏文珍 碩士 南台科技大學 資訊管理系 93 In the wake of rapid information upgrade in the current society, to boost competitive edge, we must be able to present, apply, and receive data in an effective, organized, and accurate approach. This study aims at improving quality on text classification via informational and technical support. Data mining technology, currently the most popular application approach, retrieves useful knowledge and information from a vast amount of data. The technology has now been widely applied in journalism and search engines. This paper is going to seek the most proper classification principle out of news data by classification technology, one of the data mining technologies, and evaluate Reuters-21578 portfolio, work on the drawbacks of traditional classification approaches, and enhance text classification quality. The content of Reuters-21578 is the source for analysis. We classify news tile data based on the primary classification items. During the analysis process, we collect a huge amount of key words so as to find out the number of kinds words that articles are composed. Next, a ID3-tree of policies is made with classification analysis technology to realize the proper categories that key word items characteristics belong to, the research analyzes the aspects of frequency and weighted location for an enhanced accuracy of classification results, In this thesis, the first evaluating criterion is using the values of precision, recall to verify the effectiveness, it may be offer reference contributing to the automation of text classification. Chui Cheng Chen 陳垂呈 2005 學位論文 ; thesis 70 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 南台科技大學 === 資訊管理系 === 93 === In the wake of rapid information upgrade in the current society, to boost competitive edge, we must be able to present, apply, and receive data in an effective, organized, and accurate approach. This study aims at improving quality on text classification via informational and technical support. Data mining technology, currently the most popular application approach, retrieves useful knowledge and information from a vast amount of data. The technology has now been widely applied in journalism and search engines. This paper is going to seek the most proper classification principle out of news data by classification technology, one of the data mining technologies, and evaluate Reuters-21578 portfolio, work on the drawbacks of traditional classification approaches, and enhance text classification quality.
The content of Reuters-21578 is the source for analysis. We classify news tile data based on the primary classification items. During the analysis process, we collect a huge amount of key words so as to find out the number of kinds words that articles are composed. Next, a ID3-tree of policies is made with classification analysis technology to realize the proper categories that key word items characteristics belong to, the research analyzes the aspects of frequency and weighted location for an enhanced accuracy of classification results, In this thesis, the first evaluating criterion is using the values of precision, recall to verify the effectiveness, it may be offer reference contributing to the automation of text classification.
|
author2 |
Chui Cheng Chen |
author_facet |
Chui Cheng Chen Jen wen yan 晏文珍 |
author |
Jen wen yan 晏文珍 |
spellingShingle |
Jen wen yan 晏文珍 A Study of Using Data Mining Techniques for Text Classification |
author_sort |
Jen wen yan |
title |
A Study of Using Data Mining Techniques for Text Classification |
title_short |
A Study of Using Data Mining Techniques for Text Classification |
title_full |
A Study of Using Data Mining Techniques for Text Classification |
title_fullStr |
A Study of Using Data Mining Techniques for Text Classification |
title_full_unstemmed |
A Study of Using Data Mining Techniques for Text Classification |
title_sort |
study of using data mining techniques for text classification |
publishDate |
2005 |
url |
http://ndltd.ncl.edu.tw/handle/48144129555740914501 |
work_keys_str_mv |
AT jenwenyan astudyofusingdataminingtechniquesfortextclassification AT yànwénzhēn astudyofusingdataminingtechniquesfortextclassification AT jenwenyan lìyòngzīliàotànkānjìshùyúwénjiànfēnlèizhīyánjiū AT yànwénzhēn lìyòngzīliàotànkānjìshùyúwénjiànfēnlèizhīyánjiū AT jenwenyan studyofusingdataminingtechniquesfortextclassification AT yànwénzhēn studyofusingdataminingtechniquesfortextclassification |
_version_ |
1718396244586397696 |