Chinese text classification based on improved <i>K</i> Nearest Neighbor algorithm

This paper focuses on the high dimensional text problems encountered in text classification.Document frequency(DF)-chi square statistic feature extraction method is proposed to reduce the feature items and reduce the dimension of text.Based on the <i>K</i> Nearest Neighbor(KNN) algorithm...

Full description

Bibliographic Details
Main Authors:	HUANG Chao, CHEN Junhua
Format:	Article
Language:	English
Published:	Academic Journals Center of Shanghai Normal University 2019-02-01
Series:	Journal of Shanghai Normal University (Natural Sciences)
Subjects:	text classification; <i>K</i> Nearest Neighbor(KNN)algorithm; feature extraction; similarity
Online Access:	http://qktg.shnu.edu.cn/zrb/shsfqkszrb/ch/reader/view_abstract.aspx?file_no=20190117

id	doaj-c002127def1546bd97b6579e91db6a9a
record_format	Article
spelling	doaj-c002127def1546bd97b6579e91db6a9a2020-11-25T01:19:30ZengAcademic Journals Center of Shanghai Normal UniversityJournal of Shanghai Normal University (Natural Sciences)1000-51371000-51372019-02-014819610110.3969/J.ISSN.1000-5137.2019.01.017201901000017Chinese text classification based on improved <i>K</i> Nearest Neighbor algorithmHUANG Chao0CHEN Junhua1College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, ChinaCollege of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, ChinaThis paper focuses on the high dimensional text problems encountered in text classification.Document frequency(DF)-chi square statistic feature extraction method is proposed to reduce the feature items and reduce the dimension of text.Based on the <i>K</i> Nearest Neighbor(KNN) algorithm,in view of the problem that text to be classified should be calculated in similarity with a large number of training set samples,a KNN algorithm based on grouping center vector is proposed.The center vectors of each group were obtained by grouping the sample sets in the category,so as to improve the classification performance of the algorithm.Experiments show that the improved algorithm has improved the precision rate,recall rate and <i>F</i>-measure compared with the traditional KNN algorithm,and it takes advantages of other classification algorithms.http://qktg.shnu.edu.cn/zrb/shsfqkszrb/ch/reader/view_abstract.aspx?file_no=20190117text classification; <i>K</i> Nearest Neighbor(KNN)algorithm; feature extraction; similarity
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	HUANG Chao CHEN Junhua
spellingShingle	HUANG Chao CHEN Junhua Chinese text classification based on improved <i>K</i> Nearest Neighbor algorithm Journal of Shanghai Normal University (Natural Sciences) text classification; <i>K</i> Nearest Neighbor(KNN)algorithm; feature extraction; similarity
author_facet	HUANG Chao CHEN Junhua
author_sort	HUANG Chao
title	Chinese text classification based on improved <i>K</i> Nearest Neighbor algorithm
title_short	Chinese text classification based on improved <i>K</i> Nearest Neighbor algorithm
title_full	Chinese text classification based on improved <i>K</i> Nearest Neighbor algorithm
title_fullStr	Chinese text classification based on improved <i>K</i> Nearest Neighbor algorithm
title_full_unstemmed	Chinese text classification based on improved <i>K</i> Nearest Neighbor algorithm
title_sort	chinese text classification based on improved <i>k</i> nearest neighbor algorithm
publisher	Academic Journals Center of Shanghai Normal University
series	Journal of Shanghai Normal University (Natural Sciences)
issn	1000-5137 1000-5137
publishDate	2019-02-01
description	This paper focuses on the high dimensional text problems encountered in text classification.Document frequency(DF)-chi square statistic feature extraction method is proposed to reduce the feature items and reduce the dimension of text.Based on the <i>K</i> Nearest Neighbor(KNN) algorithm,in view of the problem that text to be classified should be calculated in similarity with a large number of training set samples,a KNN algorithm based on grouping center vector is proposed.The center vectors of each group were obtained by grouping the sample sets in the category,so as to improve the classification performance of the algorithm.Experiments show that the improved algorithm has improved the precision rate,recall rate and <i>F</i>-measure compared with the traditional KNN algorithm,and it takes advantages of other classification algorithms.
topic	text classification; <i>K</i> Nearest Neighbor(KNN)algorithm; feature extraction; similarity
url	http://qktg.shnu.edu.cn/zrb/shsfqkszrb/ch/reader/view_abstract.aspx?file_no=20190117
work_keys_str_mv	AT huangchao chinesetextclassificationbasedonimprovedikinearestneighboralgorithm AT chenjunhua chinesetextclassificationbasedonimprovedikinearestneighboralgorithm
_version_	1725137918571315200

Chinese text classification based on improved <i>K</i> Nearest Neighbor algorithm

Similar Items