PENGUKURAN TINGKAT KEMIRIPAN DOKUMEN BERBASIS CLUSTER
Document similarity can be measured and used to discover other similar documents in a document collection (corpus). In a small corpus, measuring document similarity is not a problem. In a bigger corpus, comparing similarity rate between documents can be time consuming. A clustering method can be use...
Main Authors: | , |
---|---|
Format: | Article |
Language: | Indonesian |
Published: |
Universitas Lambung Mangkurat
2019-02-01
|
Series: | KLIK: Kumpulan jurnaL Ilmu Komputer |
Online Access: | http://klik.ulm.ac.id/index.php/klik/article/view/181 |
id |
doaj-6cc0b4f57e574a64815eb29187709d1f |
---|---|
record_format |
Article |
spelling |
doaj-6cc0b4f57e574a64815eb29187709d1f2020-11-24T20:46:16ZindUniversitas Lambung MangkuratKLIK: Kumpulan jurnaL Ilmu Komputer2406-78572443-406X2019-02-0161718310.20527/klik.v6i1.181104PENGUKURAN TINGKAT KEMIRIPAN DOKUMEN BERBASIS CLUSTERIbnu Santoso0Lya Hulliyyatus Suadaa1Politeknik Statistika STISPoliteknik Statistika STISDocument similarity can be measured and used to discover other similar documents in a document collection (corpus). In a small corpus, measuring document similarity is not a problem. In a bigger corpus, comparing similarity rate between documents can be time consuming. A clustering method can be used to minimize number of document collection that has to be compared to a document to save time. This research is aimed to discover the effect of clustering technique in measuring document similarity and evaluate the performance. Corpus used was undergraduate thesis of Politeknik Statistika STIS students from year 2007-2016 as many as 2.049 documents. These documents were represented as bag of words model and clustered using k-means clustering method. Measurement of similarity used is Cosine similarity. From the simulation, clustering process for 3 clusters needs longer preparation time (17,32%) but resulting in faster query processing (77,88%) with accuracy of 0,98. Clustering process for 5 clusters needs longer preparation time (31,10%) but resulting in faster query processing (83,79%) with accuracy of 0,86. Clustering process for 7 clusters needs longer preparation time (45,10%) but resulting in faster query processing (85,30%) with accuracy of 0,98.http://klik.ulm.ac.id/index.php/klik/article/view/181 |
collection |
DOAJ |
language |
Indonesian |
format |
Article |
sources |
DOAJ |
author |
Ibnu Santoso Lya Hulliyyatus Suadaa |
spellingShingle |
Ibnu Santoso Lya Hulliyyatus Suadaa PENGUKURAN TINGKAT KEMIRIPAN DOKUMEN BERBASIS CLUSTER KLIK: Kumpulan jurnaL Ilmu Komputer |
author_facet |
Ibnu Santoso Lya Hulliyyatus Suadaa |
author_sort |
Ibnu Santoso |
title |
PENGUKURAN TINGKAT KEMIRIPAN DOKUMEN BERBASIS CLUSTER |
title_short |
PENGUKURAN TINGKAT KEMIRIPAN DOKUMEN BERBASIS CLUSTER |
title_full |
PENGUKURAN TINGKAT KEMIRIPAN DOKUMEN BERBASIS CLUSTER |
title_fullStr |
PENGUKURAN TINGKAT KEMIRIPAN DOKUMEN BERBASIS CLUSTER |
title_full_unstemmed |
PENGUKURAN TINGKAT KEMIRIPAN DOKUMEN BERBASIS CLUSTER |
title_sort |
pengukuran tingkat kemiripan dokumen berbasis cluster |
publisher |
Universitas Lambung Mangkurat |
series |
KLIK: Kumpulan jurnaL Ilmu Komputer |
issn |
2406-7857 2443-406X |
publishDate |
2019-02-01 |
description |
Document similarity can be measured and used to discover other similar documents in a document collection (corpus). In a small corpus, measuring document similarity is not a problem. In a bigger corpus, comparing similarity rate between documents can be time consuming. A clustering method can be used to minimize number of document collection that has to be compared to a document to save time. This research is aimed to discover the effect of clustering technique in measuring document similarity and evaluate the performance. Corpus used was undergraduate thesis of Politeknik Statistika STIS students from year 2007-2016 as many as 2.049 documents. These documents were represented as bag of words model and clustered using k-means clustering method. Measurement of similarity used is Cosine similarity. From the simulation, clustering process for 3 clusters needs longer preparation time (17,32%) but resulting in faster query processing (77,88%) with accuracy of 0,98. Clustering process for 5 clusters needs longer preparation time (31,10%) but resulting in faster query processing (83,79%) with accuracy of 0,86. Clustering process for 7 clusters needs longer preparation time (45,10%) but resulting in faster query processing (85,30%) with accuracy of 0,98. |
url |
http://klik.ulm.ac.id/index.php/klik/article/view/181 |
work_keys_str_mv |
AT ibnusantoso pengukurantingkatkemiripandokumenberbasiscluster AT lyahulliyyatussuadaa pengukurantingkatkemiripandokumenberbasiscluster |
_version_ |
1716813059256745984 |