The K-Means Clustering Algorithm With Semantic Similarity To Estimate The Cost of Hospitalization

The cost of hospitalization from a patient can be estimated by performing a cluster of patient. One of the algorithms that is widely used for clustering is K-means. K-means algorithm, based on distance still has weaknesses in terms of measuring the proximity of meaning or semantics between data. To...

Full description

Bibliographic Details
Main Authors: Ida Bagus Gede Sarasvananda, Retantyo Wardoyo, Anny Kartika Sari
Format: Article
Language:English
Published: Universitas Gadjah Mada 2019-10-01
Series:IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
Subjects:
Online Access:https://jurnal.ugm.ac.id/ijccs/article/view/45093
Description
Summary:The cost of hospitalization from a patient can be estimated by performing a cluster of patient. One of the algorithms that is widely used for clustering is K-means. K-means algorithm, based on distance still has weaknesses in terms of measuring the proximity of meaning or semantics between data. To overcome this problem, semantic similarity can be used to measure the similarity between objects in clustering, so that, semantic proximity can be calculated. This study aims to conduct clustering of patient data by paying attention to the similarity of the patient’s disease. ICD code is used as a guide in determining a patient’s disease. The K-means method is combined with semantic similarity to measure the proximity of the patient’s ICD code. The method used to measure the semantic similarity between data, in this study, is the semantic similarity of Girardi, Leacock & Chodorow, Rada, and Jaccard Similarity. Cluster quality measurement uses the silhouette coefficient method. Based on the experimental results, the method of measuring semantic similarity data is capable to produce better quality clustering results than without semantic similarity. The best accuracy is 91.78% for the three semantic similarity methods, whereas without semantic similarity the best accuracy is 84.93%.
ISSN:1978-1520
2460-7258