An empirical study of classification on multinomial data

碩士 === 國立政治大學 === 統計研究所 === 97 === Since computer science had changed rapidly, the worldwide web made it much easier to share and receive the information. Search engines would be the ones to help us find the target information conveniently. The famous Google was also founded by the search engine. Th...

Full description

Bibliographic Details
Main Authors: Kao, Ching Hsiang, 高靖翔
Other Authors: Yue, Jack C.
Format: Others
Language:zh-TW
Published: 2009
Online Access:http://ndltd.ncl.edu.tw/handle/50454640296051488923
Description
Summary:碩士 === 國立政治大學 === 統計研究所 === 97 === Since computer science had changed rapidly, the worldwide web made it much easier to share and receive the information. Search engines would be the ones to help us find the target information conveniently. The famous Google was also founded by the search engine. The searching process is always depends on the characteristics of the web pages, for example, entropy is one of the characteristics index. The target web pages could be found by combining the index with the keywords information given by user. Or in other words, it is to find out the web pages which are the most similar to the user’s demands. In biology and ecology, similarity index function is commonly used for classification problems. But in practice, the pairwise instead of single similarity would be obtained to check if two communities are similar or not. It is dislike the thinking of search engines. This research is to find out which has better classification result between single index and pairwise index for the data which is multinomial distributed, especially distributed like a geometry distribution. This data assumption is often satisfied in ecology area. The following classification methods would be considered into this research: single index including entropy and Simpson index, pairwise index including pairwise entropy and similarity index (Yue and Clayton, 2005), and also support vector machine and logistic regression. Computer simulations and cross validations would also be considered here. In this research, it is found that the single index, entropy, has good classification result than imagine. Sometime using entropy to classify would even better than using support vector machine with raw data. But using entropy to classify is not very robust, it is the one needed to be improved in future.