OPTIMIZATION OF K-NEAREST NEIGHBOUR TO CATEGORIZE INDONESIAN'S NEWS ARTICLES

Text classification is the process of grouping documents based on similarity in categories. Some of the obstacles in doing text classification are many words appeared in the text, and some words come up with infrequent frequency (sparse words). The way to solve this problem is to conduct the feature...

Full description

Bibliographic Details
Main Authors: Afdhalul Ihsan, Ednawati Rainarli
Format: Article
Language:English
Published: UKM Press 2021-06-01
Series:Asia-Pacific Journal of Information Technology and Multimedia
Subjects:
Online Access:https://www.ukm.my/apjitm/view.php?id=201
id doaj-d6385799d729403188f084cb777f7aa8
record_format Article
spelling doaj-d6385799d729403188f084cb777f7aa82021-06-10T13:56:56ZengUKM PressAsia-Pacific Journal of Information Technology and Multimedia2289-21922021-06-0110014351https://doi.org/10.17576/apjitm-2021-1001-04OPTIMIZATION OF K-NEAREST NEIGHBOUR TO CATEGORIZE INDONESIAN'S NEWS ARTICLESAfdhalul IhsanEdnawati RainarliText classification is the process of grouping documents based on similarity in categories. Some of the obstacles in doing text classification are many words appeared in the text, and some words come up with infrequent frequency (sparse words). The way to solve this problem is to conduct the feature selection process. There are several filter-based feature selection methods; some are Chi-Square, Information Gain, Genetic Algorithm, and Particle Swarm Optimization (PSO). Aghdam's research shows that PSO is the best among those methods. This study examined PSO to optimize the k-Nearest Neighbour (k-NN) algorithm's performance in categorizing news articles. k-NN is an algorithm that is simple and easy to implement. If we use the appropriate features, then the k-NN will be a reliable algorithm. PSO algorithm is used to select keywords (term features), and it is continued with classifying the documents using k-NN. The testing process consists of three stages. The stages are tuning the parameter of k-NN, the parameter of PSO, and measuring the testing performance. The parameter tuning process aims to determine the number of neighbours used in k-NN and optimize the PSO particles. Otherwise, the performance testing compares the performance of k-NN with and without using PSO. The optimal number of neighbours is 9, with the number of particles is 50. The testing showed that using the k-NN with PSO and a 50% reduction in terms. The results 20 per cent better accuracy than k-NN without PSO. Although the PSO's process did not always find the optimal conditions, the k-NN method can produce better accuracy. In this way, the k-NN method can work better in grouping news articles, especially in Indonesian language news articles.https://www.ukm.my/apjitm/view.php?id=201feature selectionk-nearest neighbourmetaheuristicoptimizationtext classification
collection DOAJ
language English
format Article
sources DOAJ
author Afdhalul Ihsan
Ednawati Rainarli
spellingShingle Afdhalul Ihsan
Ednawati Rainarli
OPTIMIZATION OF K-NEAREST NEIGHBOUR TO CATEGORIZE INDONESIAN'S NEWS ARTICLES
Asia-Pacific Journal of Information Technology and Multimedia
feature selection
k-nearest neighbour
metaheuristic
optimization
text classification
author_facet Afdhalul Ihsan
Ednawati Rainarli
author_sort Afdhalul Ihsan
title OPTIMIZATION OF K-NEAREST NEIGHBOUR TO CATEGORIZE INDONESIAN'S NEWS ARTICLES
title_short OPTIMIZATION OF K-NEAREST NEIGHBOUR TO CATEGORIZE INDONESIAN'S NEWS ARTICLES
title_full OPTIMIZATION OF K-NEAREST NEIGHBOUR TO CATEGORIZE INDONESIAN'S NEWS ARTICLES
title_fullStr OPTIMIZATION OF K-NEAREST NEIGHBOUR TO CATEGORIZE INDONESIAN'S NEWS ARTICLES
title_full_unstemmed OPTIMIZATION OF K-NEAREST NEIGHBOUR TO CATEGORIZE INDONESIAN'S NEWS ARTICLES
title_sort optimization of k-nearest neighbour to categorize indonesian's news articles
publisher UKM Press
series Asia-Pacific Journal of Information Technology and Multimedia
issn 2289-2192
publishDate 2021-06-01
description Text classification is the process of grouping documents based on similarity in categories. Some of the obstacles in doing text classification are many words appeared in the text, and some words come up with infrequent frequency (sparse words). The way to solve this problem is to conduct the feature selection process. There are several filter-based feature selection methods; some are Chi-Square, Information Gain, Genetic Algorithm, and Particle Swarm Optimization (PSO). Aghdam's research shows that PSO is the best among those methods. This study examined PSO to optimize the k-Nearest Neighbour (k-NN) algorithm's performance in categorizing news articles. k-NN is an algorithm that is simple and easy to implement. If we use the appropriate features, then the k-NN will be a reliable algorithm. PSO algorithm is used to select keywords (term features), and it is continued with classifying the documents using k-NN. The testing process consists of three stages. The stages are tuning the parameter of k-NN, the parameter of PSO, and measuring the testing performance. The parameter tuning process aims to determine the number of neighbours used in k-NN and optimize the PSO particles. Otherwise, the performance testing compares the performance of k-NN with and without using PSO. The optimal number of neighbours is 9, with the number of particles is 50. The testing showed that using the k-NN with PSO and a 50% reduction in terms. The results 20 per cent better accuracy than k-NN without PSO. Although the PSO's process did not always find the optimal conditions, the k-NN method can produce better accuracy. In this way, the k-NN method can work better in grouping news articles, especially in Indonesian language news articles.
topic feature selection
k-nearest neighbour
metaheuristic
optimization
text classification
url https://www.ukm.my/apjitm/view.php?id=201
work_keys_str_mv AT afdhalulihsan optimizationofknearestneighbourtocategorizeindonesiansnewsarticles
AT ednawatirainarli optimizationofknearestneighbourtocategorizeindonesiansnewsarticles
_version_ 1721384859106017280