The K-Means Clustering Algorithm With Semantic Similarity To Estimate The Cost of Hospitalization

Ida Bagus Gede Sarasvananda; Retantyo Wardoyo; Anny Kartika Sari

doi:10.22146/ijccs.45093

The K-Means Clustering Algorithm With Semantic Similarity To Estimate The Cost of Hospitalization

https://doi.org/10.22146/ijccs.45093

Ida Bagus Gede Sarasvananda^(1*), Retantyo Wardoyo⁽²⁾, Anny Kartika Sari⁽³⁾

(1) Master Program of Computer Science, FMIPA UGM, Yogyakarta
(2) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(3) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(*) Corresponding Author

Abstract

The cost of hospitalization from a patient can be estimated by performing a cluster of patient. One of the algorithms that is widely used for clustering is K-means. K-means algorithm, based on distance still has weaknesses in terms of measuring the proximity of meaning or semantics between data. To overcome this problem, semantic similarity can be used to measure the similarity between objects in clustering, so that, semantic proximity can be calculated. This study aims to conduct clustering of patient data by paying attention to the similarity of the patient’s disease. ICD code is used as a guide in determining a patient’s disease. The K-means method is combined with semantic similarity to measure the proximity of the patient’s ICD code. The method used to measure the semantic similarity between data, in this study, is the semantic similarity of Girardi, Leacock & Chodorow, Rada, and Jaccard Similarity. Cluster quality measurement uses the silhouette coefficient method. Based on the experimental results, the method of measuring semantic similarity data is capable to produce better quality clustering results than without semantic similarity. The best accuracy is 91.78% for the three semantic similarity methods, whereas without semantic similarity the best accuracy is 84.93%.

Keywords

Clustering; K-means; Semantic Similarity; Sillhoutte Coefficient

Full Text:

PDF

References

[1] J. Han, and M. Kamber, “Data Mining: Concepts, Models and Techniques,” Intelligent Systems Reference Library, 2006.

[2] D. J. Bora and D. A. K. Gupta, “Effect of Different Distance Measures on the Performance of K-Means Algorithm: An Experimental Study in Matlab,” International Journal of Computer Science and Information Technologies, vol. 5, p. 6, 2014 [Online]. Available: https://arxiv.org/ftp/arxiv/papers/1405/1405.7471.pdf. [Accessed: 4-Feb-2019]

[3] S. S. Desai and J. A. Laxminarayana, “WordNet and Semantic Similarity Based Approach for Document Clustering,” in 2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), Bengaluru, India, 2016, pp. 312–317 [Online]. Available: https://ieeexplore.ieee.org/document/7779377. [Accessed: 28-Jan-2019]

[4] Ahmed, M. Malki, and S. M. Benslimane, “Ontology Partitioning: Clustering Based Approach,” International Journal of Information Technology and Computer Science, vol. 7, no. 6, pp. 1–11, May 2015 [Online]. Available: http://www.mecs-press.org/ijitcs/ijitcs-v7-n6/IJITCS-V7-N6-1.pdf. [Accessed: 2-Feb-2019]

[5] D. Girardi, S. Wartner, G. Halmerbauer, M. Ehrenmüller, H. Kosorus, and S. Dreiseitl, “Using Concept Hierarchies to Improve Calculation of Patient Similarity,” Journal of Biomedical Informatics, vol. 63, pp. 66–73, Oct. 2016 [Online]. Availbale: https://www.sciencedirect.com/science/article/pii/S1532046416300752. [Accessed: 28-Jan-2019]

[6] I. Fahrurozi, “Sistem Rekomendasi Berbasis Kombinasi Semantic Similarity dan Collaborative Filtering (Studi Kasus pada Toko Accessories Handphone Besseling Cell),” Thesis, Universitas Gadjah Mada, Yogyakarta, 2017.

[7] A. F. S. Althobaiti, “Comparison of Ontology-Based Semantic-Similarity Measures in the Biomedical Text,” Journal of Computer and Communications, vol. 05, no. 02, pp. 17–27, 2017 [Online]. Available: https://file.scirp.org/pdf/JCC_2017020917284790.pdf. [Accessed: 28-Jan-2019]

[8] G. R. Hatta, “Pedoman Manajemen Informasi Kesehatan Disarana Pelayanan Kesehatan (Revisi 3),” Jakarta: Universitas Indonesia, 2017.

[9] S. Niwattanakul, J. Singthongchai, E. Naenudorn, and S. Wanapu, “Using of Jaccard Coefficient for Keywords Similarity,” Proceedings of the International MultiConference of Enginers and Computer Scientists, p. 5, 2013 [Online]. Available: http://www.iaeng.org/publication/IMECS2013/IMECS2013_pp380-384.pdf. [Accessed: 6-Feb-2019]

[10] I. Riadi, “Framework Untuk Forensik Internet Menggunakan K-Means Clustering dan Horizontal Partitioning,” Desertasi, Universitas Gadjah Mada, Yogyakarta, 2014.

DOI: https://doi.org/10.22146/ijccs.45093

Article Metrics

Abstract views : 10016 |

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Copyright of :IJCCS (Indonesian Journal of Computing and Cybernetics Systems)ISSN 1978-1520 (print); ISSN 2460-7258 (online)is a scientific journal the results of Computingand Cybernetics Systems
A publication of IndoCEISS.Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281Fax: +62274 555133email:ijccs.mipa@ugm.ac.id | http://jurnal.ugm.ac.id/ijccs

View My Stats1View My Stats2

Username
Password
Remember me