PENGUKURAN TINGKAT KEMIRIPAN DOKUMEN BERBASIS CLUSTER

Document similarity can be measured and used to discover other similar documents in a document collection (corpus). In a small corpus, measuring document similarity is not a problem. In a bigger corpus, comparing similarity rate between documents can be time consuming. A clustering method can be use...

Full description

Bibliographic Details
Main Authors: Ibnu Santoso, Lya Hulliyyatus Suadaa
Format: Article
Language:Indonesian
Published: Universitas Lambung Mangkurat 2019-02-01
Series:KLIK: Kumpulan jurnaL Ilmu Komputer
Online Access:http://klik.ulm.ac.id/index.php/klik/article/view/181
id doaj-6cc0b4f57e574a64815eb29187709d1f
record_format Article
spelling doaj-6cc0b4f57e574a64815eb29187709d1f2020-11-24T20:46:16ZindUniversitas Lambung MangkuratKLIK: Kumpulan jurnaL Ilmu Komputer2406-78572443-406X2019-02-0161718310.20527/klik.v6i1.181104PENGUKURAN TINGKAT KEMIRIPAN DOKUMEN BERBASIS CLUSTERIbnu Santoso0Lya Hulliyyatus Suadaa1Politeknik Statistika STISPoliteknik Statistika STISDocument similarity can be measured and used to discover other similar documents in a document collection (corpus). In a small corpus, measuring document similarity is not a problem. In a bigger corpus, comparing similarity rate between documents can be time consuming. A clustering method can be used to minimize number of document collection that has to be compared to a document to save time. This research is aimed to discover the effect of clustering technique in measuring document similarity and evaluate the performance. Corpus used was undergraduate thesis of Politeknik Statistika STIS students from year 2007-2016 as many as 2.049 documents. These documents were represented as bag of words model and clustered using k-means clustering method. Measurement of similarity used is Cosine similarity. From the simulation, clustering process for 3 clusters needs longer preparation time (17,32%) but resulting in faster query processing (77,88%) with accuracy of 0,98. Clustering process for 5 clusters needs longer preparation time (31,10%) but resulting in faster query processing (83,79%) with accuracy of 0,86. Clustering process for 7 clusters needs longer preparation time (45,10%) but resulting in faster query processing (85,30%) with accuracy of 0,98.http://klik.ulm.ac.id/index.php/klik/article/view/181
collection DOAJ
language Indonesian
format Article
sources DOAJ
author Ibnu Santoso
Lya Hulliyyatus Suadaa
spellingShingle Ibnu Santoso
Lya Hulliyyatus Suadaa
PENGUKURAN TINGKAT KEMIRIPAN DOKUMEN BERBASIS CLUSTER
KLIK: Kumpulan jurnaL Ilmu Komputer
author_facet Ibnu Santoso
Lya Hulliyyatus Suadaa
author_sort Ibnu Santoso
title PENGUKURAN TINGKAT KEMIRIPAN DOKUMEN BERBASIS CLUSTER
title_short PENGUKURAN TINGKAT KEMIRIPAN DOKUMEN BERBASIS CLUSTER
title_full PENGUKURAN TINGKAT KEMIRIPAN DOKUMEN BERBASIS CLUSTER
title_fullStr PENGUKURAN TINGKAT KEMIRIPAN DOKUMEN BERBASIS CLUSTER
title_full_unstemmed PENGUKURAN TINGKAT KEMIRIPAN DOKUMEN BERBASIS CLUSTER
title_sort pengukuran tingkat kemiripan dokumen berbasis cluster
publisher Universitas Lambung Mangkurat
series KLIK: Kumpulan jurnaL Ilmu Komputer
issn 2406-7857
2443-406X
publishDate 2019-02-01
description Document similarity can be measured and used to discover other similar documents in a document collection (corpus). In a small corpus, measuring document similarity is not a problem. In a bigger corpus, comparing similarity rate between documents can be time consuming. A clustering method can be used to minimize number of document collection that has to be compared to a document to save time. This research is aimed to discover the effect of clustering technique in measuring document similarity and evaluate the performance. Corpus used was undergraduate thesis of Politeknik Statistika STIS students from year 2007-2016 as many as 2.049 documents. These documents were represented as bag of words model and clustered using k-means clustering method. Measurement of similarity used is Cosine similarity. From the simulation, clustering process for 3 clusters needs longer preparation time (17,32%) but resulting in faster query processing (77,88%) with accuracy of 0,98. Clustering process for 5 clusters needs longer preparation time (31,10%) but resulting in faster query processing (83,79%) with accuracy of 0,86. Clustering process for 7 clusters needs longer preparation time (45,10%) but resulting in faster query processing (85,30%) with accuracy of 0,98.
url http://klik.ulm.ac.id/index.php/klik/article/view/181
work_keys_str_mv AT ibnusantoso pengukurantingkatkemiripandokumenberbasiscluster
AT lyahulliyyatussuadaa pengukurantingkatkemiripandokumenberbasiscluster
_version_ 1716813059256745984