Chinese text classification based on improved <i>K</i> Nearest Neighbor algorithm
This paper focuses on the high dimensional text problems encountered in text classification.Document frequency(DF)-chi square statistic feature extraction method is proposed to reduce the feature items and reduce the dimension of text.Based on the <i>K</i> Nearest Neighbor(KNN) algorithm...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Academic Journals Center of Shanghai Normal University
2019-02-01
|
Series: | Journal of Shanghai Normal University (Natural Sciences) |
Subjects: | |
Online Access: | http://qktg.shnu.edu.cn/zrb/shsfqkszrb/ch/reader/view_abstract.aspx?file_no=20190117 |
id |
doaj-c002127def1546bd97b6579e91db6a9a |
---|---|
record_format |
Article |
spelling |
doaj-c002127def1546bd97b6579e91db6a9a2020-11-25T01:19:30ZengAcademic Journals Center of Shanghai Normal UniversityJournal of Shanghai Normal University (Natural Sciences)1000-51371000-51372019-02-014819610110.3969/J.ISSN.1000-5137.2019.01.017201901000017Chinese text classification based on improved <i>K</i> Nearest Neighbor algorithmHUANG Chao0CHEN Junhua1College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, ChinaCollege of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, ChinaThis paper focuses on the high dimensional text problems encountered in text classification.Document frequency(DF)-chi square statistic feature extraction method is proposed to reduce the feature items and reduce the dimension of text.Based on the <i>K</i> Nearest Neighbor(KNN) algorithm,in view of the problem that text to be classified should be calculated in similarity with a large number of training set samples,a KNN algorithm based on grouping center vector is proposed.The center vectors of each group were obtained by grouping the sample sets in the category,so as to improve the classification performance of the algorithm.Experiments show that the improved algorithm has improved the precision rate,recall rate and <i>F</i>-measure compared with the traditional KNN algorithm,and it takes advantages of other classification algorithms.http://qktg.shnu.edu.cn/zrb/shsfqkszrb/ch/reader/view_abstract.aspx?file_no=20190117text classification; <i>K</i> Nearest Neighbor(KNN)algorithm; feature extraction; similarity |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
HUANG Chao CHEN Junhua |
spellingShingle |
HUANG Chao CHEN Junhua Chinese text classification based on improved <i>K</i> Nearest Neighbor algorithm Journal of Shanghai Normal University (Natural Sciences) text classification; <i>K</i> Nearest Neighbor(KNN)algorithm; feature extraction; similarity |
author_facet |
HUANG Chao CHEN Junhua |
author_sort |
HUANG Chao |
title |
Chinese text classification based on improved <i>K</i> Nearest Neighbor algorithm |
title_short |
Chinese text classification based on improved <i>K</i> Nearest Neighbor algorithm |
title_full |
Chinese text classification based on improved <i>K</i> Nearest Neighbor algorithm |
title_fullStr |
Chinese text classification based on improved <i>K</i> Nearest Neighbor algorithm |
title_full_unstemmed |
Chinese text classification based on improved <i>K</i> Nearest Neighbor algorithm |
title_sort |
chinese text classification based on improved <i>k</i> nearest neighbor algorithm |
publisher |
Academic Journals Center of Shanghai Normal University |
series |
Journal of Shanghai Normal University (Natural Sciences) |
issn |
1000-5137 1000-5137 |
publishDate |
2019-02-01 |
description |
This paper focuses on the high dimensional text problems encountered in text classification.Document frequency(DF)-chi square statistic feature extraction method is proposed to reduce the feature items and reduce the dimension of text.Based on the <i>K</i> Nearest Neighbor(KNN) algorithm,in view of the problem that text to be classified should be calculated in similarity with a large number of training set samples,a KNN algorithm based on grouping center vector is proposed.The center vectors of each group were obtained by grouping the sample sets in the category,so as to improve the classification performance of the algorithm.Experiments show that the improved algorithm has improved the precision rate,recall rate and <i>F</i>-measure compared with the traditional KNN algorithm,and it takes advantages of other classification algorithms. |
topic |
text classification; <i>K</i> Nearest Neighbor(KNN)algorithm; feature extraction; similarity |
url |
http://qktg.shnu.edu.cn/zrb/shsfqkszrb/ch/reader/view_abstract.aspx?file_no=20190117 |
work_keys_str_mv |
AT huangchao chinesetextclassificationbasedonimprovedikinearestneighboralgorithm AT chenjunhua chinesetextclassificationbasedonimprovedikinearestneighboralgorithm |
_version_ |
1725137918571315200 |