Efficient Active Learning Based on Localized Uncertainty Clusters

碩士 === 國立中正大學 === 資訊工程研究所 === 100 === There are many approaches which incorporate clustering into active learning for avoiding selecting the similar data points in the active learning. Yet, the traditional clustering methods do not consider increasing the accuracy of the active learning. There are t...

Full description

Bibliographic Details
Main Authors: Wu,WangPing, 吳婉萍
Other Authors: Lee,SingLing
Format: Others
Language:en_US
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/13780016063951400948
Description
Summary:碩士 === 國立中正大學 === 資訊工程研究所 === 100 === There are many approaches which incorporate clustering into active learning for avoiding selecting the similar data points in the active learning. Yet, the traditional clustering methods do not consider increasing the accuracy of the active learning. There are two main purposes in this thesis. One is that more uncertain representatives of clusters are generated in clustering to increase the accuracy of the classification, and the other is that clustering with an unknown number of clusters. In our method, data points with similar in the local uncertainty, the small difference of coordinates and the large overlapped neighborhood will be collected into the same cluster. And the idea of certainty-based active learning(CBAL), which a local classifier is built by using neighbors, is used for finding the appropriate size of neighborhood. In our approach, the generated representatives can represent the effect of all data points in the same cluster to the classifier. Moreover, more uncertain representatives are generated, which will increase the accuracy of active learning. In addition, we propose a new clustering method which uses the value of local distance-based outlier factor(LDOF) to expand size of clusters and uses the distance measurement metric based on local uncertainty and overlapped neighborhood (LNC formula) to measure the similarity between data points. Finally, the experimental results show that the proposed method can select more uncertain training data in the synthetic dataset. And, in the UCI datasets , the accuracy and running time are also better.