A Study of Using Data Mining Techniques for Text Classification

碩士 === 南台科技大學 === 資訊管理系 === 93 === In the wake of rapid information upgrade in the current society, to boost competitive edge, we must be able to present, apply, and receive data in an effective, organized, and accurate approach. This study aims at improving quality on text classification via inform...

Full description

Bibliographic Details
Main Authors: Jen wen yan, 晏文珍
Other Authors: Chui Cheng Chen
Format: Others
Language:zh-TW
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/48144129555740914501
Description
Summary:碩士 === 南台科技大學 === 資訊管理系 === 93 === In the wake of rapid information upgrade in the current society, to boost competitive edge, we must be able to present, apply, and receive data in an effective, organized, and accurate approach. This study aims at improving quality on text classification via informational and technical support. Data mining technology, currently the most popular application approach, retrieves useful knowledge and information from a vast amount of data. The technology has now been widely applied in journalism and search engines. This paper is going to seek the most proper classification principle out of news data by classification technology, one of the data mining technologies, and evaluate Reuters-21578 portfolio, work on the drawbacks of traditional classification approaches, and enhance text classification quality. The content of Reuters-21578 is the source for analysis. We classify news tile data based on the primary classification items. During the analysis process, we collect a huge amount of key words so as to find out the number of kinds words that articles are composed. Next, a ID3-tree of policies is made with classification analysis technology to realize the proper categories that key word items characteristics belong to, the research analyzes the aspects of frequency and weighted location for an enhanced accuracy of classification results, In this thesis, the first evaluating criterion is using the values of precision, recall to verify the effectiveness, it may be offer reference contributing to the automation of text classification.