Improving the performance of Naive Bayes Classifier by using Selective Naive Bayesian Algorithm and Prior Distributions

碩士 === 國立成功大學 === 工業與資訊管理學系碩博士班 === 97 === Naive Bayes classifiers have been widely used for data classification because of its computational efficiency and competitive accuracy. When all attributes are employed for classification, the accuracy of the naive Bayes classifier is generally affected by...

Full description

Bibliographic Details
Main Authors: Liang-Hao Chang, 張良豪
Other Authors: Tzu-Tsung Wong
Format: Others
Language:zh-TW
Published: 2009
Online Access:http://ndltd.ncl.edu.tw/handle/92613736217287175606
Description
Summary:碩士 === 國立成功大學 === 工業與資訊管理學系碩博士班 === 97 === Naive Bayes classifiers have been widely used for data classification because of its computational efficiency and competitive accuracy. When all attributes are employed for classification, the accuracy of the naive Bayes classifier is generally affected by noisy attributes. A mechanism for attribute selection should be considered for improving its prediction accuracy. Selective naive Bayesian method is a very successful approach for removing noisy and/or redundant attributes. In addition, attributes are generally assumed to have prior distributions, such as Dirichlet or generalized Dirichlet distributions, for achieving a higher prediction accuracy. Many studies have proposed the methods for finding the best priors for attributes, but none of them takes attribute selection into account. Thus, this thesis proposes two models for combining prior distribution and feature selection together for increasing the accuracy of the naive Bayes classifier. Model I finds out the best prior for each attribute after all attributes have been determined by the selective naive Bayesian algorithm. Model II finds the best prior of the newest attribute determined by the selective naive Bayesian algorithm when all predecessors of the newest attribute have their best priors. The experimental result on 17 data sets form UCI data repository shows that Model I with the general Dirichlet prior generally and consistently achieves a higher classification accuracy.