The K-Means Clustering Algorithm With Semantic Similarity To Estimate The Cost of Hospitalization

Ida Bagus Gede Sarasvananda(1*), Retantyo Wardoyo(2), Anny Kartika Sari(3)

(1) Master Program of Computer Science, FMIPA UGM, Yogyakarta
(2) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(3) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(*) Corresponding Author


 The cost of hospitalization from a patient can be estimated by performing a cluster of patient. One of the algorithms that is widely used for clustering is K-means. K-means algorithm, based on distance still has weaknesses in terms of measuring the proximity of meaning or semantics between data. To overcome this problem, semantic similarity can be used to measure the similarity between objects in clustering, so that, semantic proximity can be calculated. This study aims to conduct clustering of patient data by paying attention to the similarity of the patient’s disease. ICD code is used as a guide in determining a patient’s disease. The K-means method is combined with semantic similarity to measure the proximity of the patient’s ICD code. The method used to measure the semantic similarity between data, in this study, is the semantic similarity of Girardi, Leacock & Chodorow, Rada, and Jaccard Similarity. Cluster quality measurement uses the silhouette coefficient method. Based on the experimental results, the method of measuring semantic similarity data is capable to produce better quality clustering results than without semantic similarity. The best accuracy is 91.78% for the three semantic similarity methods, whereas without semantic similarity the best accuracy is 84.93%.


Clustering; K-means; Semantic Similarity; Sillhoutte Coefficient

Full Text:



[1] J. Han, and M. Kamber, “Data Mining: Concepts, Models and Techniques,” Intelligent Systems Reference Library, 2006.

[2] D. J. Bora and D. A. K. Gupta, “Effect of Different Distance Measures on the Performance of K-Means Algorithm: An Experimental Study in Matlab,” International Journal of Computer Science and Information Technologies, vol. 5, p. 6, 2014 [Online]. Available: [Accessed: 4-Feb-2019]

[3] S. S. Desai and J. A. Laxminarayana, “WordNet and Semantic Similarity Based Approach for Document Clustering,” in 2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), Bengaluru, India, 2016, pp. 312–317 [Online]. Available: [Accessed: 28-Jan-2019]

[4] Ahmed, M. Malki, and S. M. Benslimane, “Ontology Partitioning: Clustering Based Approach,” International Journal of Information Technology and Computer Science, vol. 7, no. 6, pp. 1–11, May 2015 [Online]. Available: [Accessed: 2-Feb-2019]

[5] D. Girardi, S. Wartner, G. Halmerbauer, M. Ehrenmüller, H. Kosorus, and S. Dreiseitl, “Using Concept Hierarchies to Improve Calculation of Patient Similarity,” Journal of Biomedical Informatics, vol. 63, pp. 66–73, Oct. 2016 [Online]. Availbale: [Accessed: 28-Jan-2019]

[6] I. Fahrurozi, “Sistem Rekomendasi Berbasis Kombinasi Semantic Similarity dan Collaborative Filtering (Studi Kasus pada Toko Accessories Handphone Besseling Cell),” Thesis, Universitas Gadjah Mada, Yogyakarta, 2017.

[7] A. F. S. Althobaiti, “Comparison of Ontology-Based Semantic-Similarity Measures in the Biomedical Text,” Journal of Computer and Communications, vol. 05, no. 02, pp. 17–27, 2017 [Online]. Available: [Accessed: 28-Jan-2019]

[8] G. R. Hatta, “Pedoman Manajemen Informasi Kesehatan Disarana Pelayanan Kesehatan (Revisi 3),” Jakarta: Universitas Indonesia, 2017.

[9] S. Niwattanakul, J. Singthongchai, E. Naenudorn, and S. Wanapu, “Using of Jaccard Coefficient for Keywords Similarity,” Proceedings of the International MultiConference of Enginers and Computer Scientists, p. 5, 2013 [Online]. Available: [Accessed: 6-Feb-2019]

[10] I. Riadi, “Framework Untuk Forensik Internet Menggunakan K-Means Clustering dan Horizontal Partitioning,” Desertasi, Universitas Gadjah Mada, Yogyakarta, 2014.


Article Metrics

Abstract views : 5248 | views : 3636


  • There are currently no refbacks.

Copyright (c) 2019 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Copyright of :
IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
ISSN 1978-1520 (print); ISSN 2460-7258 (online)
is a scientific journal the results of Computing
and Cybernetics Systems
A publication of IndoCEISS.
Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281
Fax: +62274 555133 |

View My Stats1
View My Stats2