Chinese text classification based on improved <i>K</i> Nearest Neighbor algorithm

This paper focuses on the high dimensional text problems encountered in text classification.Document frequency(DF)-chi square statistic feature extraction method is proposed to reduce the feature items and reduce the dimension of text.Based on the <i>K</i> Nearest Neighbor(KNN) algorithm...

Full description

Bibliographic Details
Main Authors: HUANG Chao, CHEN Junhua
Format: Article
Language:English
Published: Academic Journals Center of Shanghai Normal University 2019-02-01
Series:Journal of Shanghai Normal University (Natural Sciences)
Subjects:
Online Access:http://qktg.shnu.edu.cn/zrb/shsfqkszrb/ch/reader/view_abstract.aspx?file_no=20190117
Description
Summary:This paper focuses on the high dimensional text problems encountered in text classification.Document frequency(DF)-chi square statistic feature extraction method is proposed to reduce the feature items and reduce the dimension of text.Based on the <i>K</i> Nearest Neighbor(KNN) algorithm,in view of the problem that text to be classified should be calculated in similarity with a large number of training set samples,a KNN algorithm based on grouping center vector is proposed.The center vectors of each group were obtained by grouping the sample sets in the category,so as to improve the classification performance of the algorithm.Experiments show that the improved algorithm has improved the precision rate,recall rate and <i>F</i>-measure compared with the traditional KNN algorithm,and it takes advantages of other classification algorithms.
ISSN:1000-5137
1000-5137