Chinese text classification based on improved <i>K</i> Nearest Neighbor algorithm
This paper focuses on the high dimensional text problems encountered in text classification.Document frequency(DF)-chi square statistic feature extraction method is proposed to reduce the feature items and reduce the dimension of text.Based on the <i>K</i> Nearest Neighbor(KNN) algorithm...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Academic Journals Center of Shanghai Normal University
2019-02-01
|
Series: | Journal of Shanghai Normal University (Natural Sciences) |
Subjects: | |
Online Access: | http://qktg.shnu.edu.cn/zrb/shsfqkszrb/ch/reader/view_abstract.aspx?file_no=20190117 |
Summary: | This paper focuses on the high dimensional text problems encountered in text classification.Document frequency(DF)-chi square statistic feature extraction method is proposed to reduce the feature items and reduce the dimension of text.Based on the <i>K</i> Nearest Neighbor(KNN) algorithm,in view of the problem that text to be classified should be calculated in similarity with a large number of training set samples,a KNN algorithm based on grouping center vector is proposed.The center vectors of each group were obtained by grouping the sample sets in the category,so as to improve the classification performance of the algorithm.Experiments show that the improved algorithm has improved the precision rate,recall rate and <i>F</i>-measure compared with the traditional KNN algorithm,and it takes advantages of other classification algorithms. |
---|---|
ISSN: | 1000-5137 1000-5137 |