Metode Pembobotan Berbasis Topik dan Kelas untuk Berita Online Berbahasa Indonesia

Clustering of news documents manually depends on the ability and accuracy of the human so that it can lead to errors in the grouping process of documents. Therefore, it is necessary to group the news document automatically. In this clustering, we need a weighting method that includes TF.IDF.ICF. In...

Full description

Bibliographic Details
Main Authors: Maryamah Maryamah, Made Agus Putra Subali, Lailly Qolby, Agus Zainal Arifin, Ali Fauzi
Format: Article
Language:English
Published: Indonesia Association of Computational Linguistics (INACL) 2018-03-01
Series:Jurnal Linguistik Komputasional
Online Access:http://inacl.id/journal/index.php/jlk/article/view/4
id doaj-62f5d3213b2f4d07ba7020f9722c630a
record_format Article
spelling doaj-62f5d3213b2f4d07ba7020f9722c630a2020-11-25T01:02:59ZengIndonesia Association of Computational Linguistics (INACL)Jurnal Linguistik Komputasional2621-93362018-03-0111111610.26418/jlk.v1i1.44Metode Pembobotan Berbasis Topik dan Kelas untuk Berita Online Berbahasa IndonesiaMaryamah MaryamahMade Agus Putra SubaliLailly QolbyAgus Zainal ArifinAli FauziClustering of news documents manually depends on the ability and accuracy of the human so that it can lead to errors in the grouping process of documents. Therefore, it is necessary to group the news document automatically. In this clustering, we need a weighting method that includes TF.IDF.ICF. In this paper we propose a new weighting algorithm is TF.IDF.ICF.ITF to automatically clustering documents automatically through statistical data patterns so that errors in manual grouping of documents can be reduced and more efficient. K-Means ++ is an algorithm for classification and is the development of the K-Means algorithm in the initial cluster initialization stage which is easy to implement and has more stable results. K-Means ++ classifies documents at the weighting stages of Inverse Class Frequency (ICF). ICF is developed from the use of class-based weighting for the term weighting term in the document. The terms that often appear in many classes will have a small but informative value. The proposed weighting is calculated. Testing is done by using a certain query on some number of best features, the results obtained by TF.IDF.ICF.ITF method gives less optimal results.http://inacl.id/journal/index.php/jlk/article/view/4
collection DOAJ
language English
format Article
sources DOAJ
author Maryamah Maryamah
Made Agus Putra Subali
Lailly Qolby
Agus Zainal Arifin
Ali Fauzi
spellingShingle Maryamah Maryamah
Made Agus Putra Subali
Lailly Qolby
Agus Zainal Arifin
Ali Fauzi
Metode Pembobotan Berbasis Topik dan Kelas untuk Berita Online Berbahasa Indonesia
Jurnal Linguistik Komputasional
author_facet Maryamah Maryamah
Made Agus Putra Subali
Lailly Qolby
Agus Zainal Arifin
Ali Fauzi
author_sort Maryamah Maryamah
title Metode Pembobotan Berbasis Topik dan Kelas untuk Berita Online Berbahasa Indonesia
title_short Metode Pembobotan Berbasis Topik dan Kelas untuk Berita Online Berbahasa Indonesia
title_full Metode Pembobotan Berbasis Topik dan Kelas untuk Berita Online Berbahasa Indonesia
title_fullStr Metode Pembobotan Berbasis Topik dan Kelas untuk Berita Online Berbahasa Indonesia
title_full_unstemmed Metode Pembobotan Berbasis Topik dan Kelas untuk Berita Online Berbahasa Indonesia
title_sort metode pembobotan berbasis topik dan kelas untuk berita online berbahasa indonesia
publisher Indonesia Association of Computational Linguistics (INACL)
series Jurnal Linguistik Komputasional
issn 2621-9336
publishDate 2018-03-01
description Clustering of news documents manually depends on the ability and accuracy of the human so that it can lead to errors in the grouping process of documents. Therefore, it is necessary to group the news document automatically. In this clustering, we need a weighting method that includes TF.IDF.ICF. In this paper we propose a new weighting algorithm is TF.IDF.ICF.ITF to automatically clustering documents automatically through statistical data patterns so that errors in manual grouping of documents can be reduced and more efficient. K-Means ++ is an algorithm for classification and is the development of the K-Means algorithm in the initial cluster initialization stage which is easy to implement and has more stable results. K-Means ++ classifies documents at the weighting stages of Inverse Class Frequency (ICF). ICF is developed from the use of class-based weighting for the term weighting term in the document. The terms that often appear in many classes will have a small but informative value. The proposed weighting is calculated. Testing is done by using a certain query on some number of best features, the results obtained by TF.IDF.ICF.ITF method gives less optimal results.
url http://inacl.id/journal/index.php/jlk/article/view/4
work_keys_str_mv AT maryamahmaryamah metodepembobotanberbasistopikdankelasuntukberitaonlineberbahasaindonesia
AT madeagusputrasubali metodepembobotanberbasistopikdankelasuntukberitaonlineberbahasaindonesia
AT laillyqolby metodepembobotanberbasistopikdankelasuntukberitaonlineberbahasaindonesia
AT aguszainalarifin metodepembobotanberbasistopikdankelasuntukberitaonlineberbahasaindonesia
AT alifauzi metodepembobotanberbasistopikdankelasuntukberitaonlineberbahasaindonesia
_version_ 1725202919324647424