Classifying Heart Disease through Fusion of Multi-Source Datasets: Integration of Feature Selection and Explainable Machine Learning Techniques
Kasiful Aprianto(1), Mila Desi Anasanti(2*)
(1) Nusa Mandiri University
(2) Nusa Mandiri University
(*) Corresponding Author
Abstract
Keywords
Full Text:
PDFReferences
“Global health estimates: Leading causes of death.” Accessed: Oct. 19, 2023. [Online]. Available: https://www.who.int/data/gho/data/themes/mortality-and-global-health-estimates/ghe-leading-causes-of-death
C. J. L. Murray, “The Global Burden of Disease Study at 30 years,” Nat Med, vol. 28, no. 10, pp. 2019–2026, Oct. 2022, doi: 10.1038/s41591-022-01990-1.
V. Shorewala, “Early detection of coronary heart disease using ensemble techniques,” Informatics in Medicine Unlocked, vol. 26, p. 100655, 2021, doi: 10.1016/j.imu.2021.100655.
J. Li, A. Loerbroks, H. Bosma, and P. Angerer, “Work stress and cardiovascular disease: a life course perspective,” Journal of Occupational Health, vol. 58, no. 2, pp. 216–219, 2016, doi: 10.1539/joh.15-0326-OP.
Purushottam, K. Saxena, and R. Sharma, “Efficient Heart Disease Prediction System,” Procedia Computer Science, vol. 85, pp. 962–969, 2016, doi: 10.1016/j.procs.2016.05.288.
J. Maiga, G. G. Hungilo, and Pranowo, “Comparison of Machine Learning Models in Prediction of Cardiovascular Disease Using Health Record Data,” in 2019 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), Oct. 2019, pp. 45–48. doi: 10.1109/ICIMCIS48181.2019.8985205.
R. Waigi, S. Choudhary, P. Fulzele, and G. Mishra, “Predicting the risk of heart disease using advanced machine learning approach,” European Journal of Molecular and Clinical Medicine, vol. 7, pp. 1638–1645, Sep. 2020.
M. Khan and M. R. Mondal, “Data-Driven Diagnosis of Heart Disease,” International Journal of Computer Applications, vol. 176, pp. 46–54, Jul. 2020, doi: 10.5120/ijca2020920549.
E. Maini, B. Venkateswarlu, and A. Gupta, “Applying Machine Learning Algorithms to Develop a Universal Cardiovascular Disease Prediction System,” in International Conference on Intelligent Data Communication Technologies and Internet of Things (ICICI) 2018, J. Hemanth, X. Fernando, P. Lafata, and Z. Baig, Eds., in Lecture Notes on Data Engineering and Communications Technologies. Cham: Springer International Publishing, 2019, pp. 627–632. doi: 10.1007/978-3-030-03146-6_69.
M. Kavitha, G. Gnaneswar, R. Dinesh, Y. R. Sai, and R. S. Suraj, “Heart Disease Prediction using Hybrid machine Learning Model,” in 2021 6th International Conference on Inventive Computation Technologies (ICICT), Jan. 2021, pp. 1329–1333. doi: 10.1109/ICICT50816.2021.9358597.
D. Shah, S. Patel, and S. K. Bharti, “Heart Disease Prediction using Machine Learning Techniques,” SN COMPUT. SCI., vol. 1, no. 6, p. 345, Oct. 2020, doi: 10.1007/s42979-020-00365-y.
R. Bharti, A. Khamparia, M. Shabaz, G. Dhiman, S. Pande, and P. Singh, “Prediction of Heart Disease Using a Combination of Machine Learning and Deep Learning,” Computational Intelligence and Neuroscience, vol. 2021, pp. 1–11, Jul. 2021, doi: 10.1155/2021/8387680.
W. S. Andras Janosi, “Heart Disease.” UCI Machine Learning Repository, 1989. doi: 10.24432/C52P4X.
M. Siddhartha, “Heart Disease Dataset (Comprehensive).” IEEE DataPort, Nov. 05, 2020. doi: 10.21227/DZ4T-CM36.
“Heart Disease Predication.” Accessed: Oct. 24, 2023. [Online]. Available: https://www.kaggle.com/datasets/durgesh2050/heart-disease-predication
F. H. Alfebi and M. D. Anasanti, “Improving Cardiovascular Disease Prediction by Integrating Imputation, Imbalance Resampling, and Feature Selection Techniques into Machine Learning Model,” Indonesian J. Comput. Cybern. Syst., vol. 17, no. 1, p. 55, Feb. 2023, doi: 10.22146/ijccs.80214.
A. Novianto and M. D. Anasanti, “Autism Spectrum Disorder (ASD) Identification Using Feature-Based Machine Learning Classification Model,” Indonesian J. Comput. Cybern. Syst., vol. 17, no. 3, p. 259, Jul. 2023, doi: 10.22146/ijccs.83585.
A. Yarahmadi et al., “Curcumin attenuates development of depressive-like behavior in male rats after spinal cord injury: involvement of NLRP3 inflammasome,” J. Contemp. Med. Sci., vol. 8, no. 3, Jun. 2022, doi: 10.22317/jcms.v8i3.1230.
P. Geurts, A. Irrthum, and L. Wehenkel, “Supervised learning with decision tree-based methods in computational and systems biology,” Mol. BioSyst., vol. 5, no. 12, p. 1593, 2009, doi: 10.1039/b907946g.
M. Schonlau and R. Y. Zou, “The random forest algorithm for statistical learning,” The Stata Journal, vol. 20, no. 1, pp. 3–29, Mar. 2020, doi: 10.1177/1536867X20909688.
Q. Wang, “Support Vector Machine Algorithm in Machine Learning,” in 2022 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China: IEEE, Jun. 2022, pp. 750–756. doi: 10.1109/ICAICA54878.2022.9844516.
K. M. Al-Aidaroos, A. A. Bakar, and Z. Othman, “Naïve bayes variants in classification learning,” in 2010 International Conference on Information Retrieval & Knowledge Management (CAMP), Shah Alam, Selangor: IEEE, Mar. 2010, pp. 276–281. doi: 10.1109/INFRKM.2010.5466902.
K. Siddique, Z. Akhtar, H. Lee, W. Kim, and Y. Kim, “Toward Bulk Synchronous Parallel-Based Machine Learning Techniques for Anomaly Detection in High-Speed Big Data Networks,” Symmetry, vol. 9, no. 9, p. 197, Sep. 2017, doi: 10.3390/sym9090197.
K. Taunk, S. De, S. Verma, and A. Swetapadma, “A Brief Review of Nearest Neighbor Algorithm for Learning and Classification,” in 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India: IEEE, May 2019, pp. 1255–1260. doi: 10.1109/ICCS45141.2019.9065747.
T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco California USA: ACM, Aug. 2016, pp. 785–794. doi: 10.1145/2939672.2939785.
J. T. Hancock and T. M. Khoshgoftaar, “CatBoost for big data: an interdisciplinary review,” J Big Data, vol. 7, no. 1, p. 94, Dec. 2020, doi: 10.1186/s40537-020-00369-8.
G. Ke et al., “LightGBM: A Highly Efficient Gradient Boosting Decision Tree,” in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2017. Accessed: Nov. 04, 2023. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html
C. Rudin, “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,” Nat Mach Intell, vol. 1, no. 5, pp. 206–215, May 2019, doi: 10.1038/s42256-019-0048-x.
P. Linardatos, V. Papastefanopoulos, and S. Kotsiantis, “Explainable AI: A Review of Machine Learning Interpretability Methods,” Entropy, vol. 23, no. 1, p. 18, Dec. 2020, doi: 10.3390/e23010018.
U. Bhatt et al., “Explainable Machine Learning in Deployment,” 2019, doi: 10.48550/ARXIV.1909.06342.
A. Ejmalian et al., “Prediction of Acute Kidney Injury After Cardiac Surgery Using Interpretable Machine Learning,” Anesth Pain Med, vol. 12, no. 4, Sep. 2022, doi: 10.5812/aapm-127140.
K. Kobylińska, T. Orłowski, M. Adamek, and P. Biecek, “Explainable Machine Learning for Lung Cancer Screening Models,” Applied Sciences, vol. 12, no. 4, p. 1926, Feb. 2022, doi: 10.3390/app12041926.
J. Jiménez-Luna, F. Grisoni, and G. Schneider, “Drug discovery with explainable artificial intelligence,” Nat Mach Intell, vol. 2, no. 10, pp. 573–584, Oct. 2020, doi: 10.1038/s42256-020-00236-4.
F. Gabbay, S. Bar-Lev, O. Montano, and N. Hadad, “A LIME-Based Explainable Machine Learning Model for Predicting the Severity Level of COVID-19 Diagnosed Patients,” Applied Sciences, vol. 11, no. 21, p. 10417, Nov. 2021, doi: 10.3390/app112110417.
U. Bhatt et al., “Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty,” in Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, Virtual Event USA: ACM, Jul. 2021, pp. 401–413. doi: 10.1145/3461702.3462571.
C. Strobl, A.-L. Boulesteix, T. Kneib, T. Augustin, and A. Zeileis, “Conditional variable importance for random forests,” BMC Bioinformatics, vol. 9, no. 1, p. 307, Dec. 2008, doi: 10.1186/1471-2105-9-307.
R. Kitani and S. Iwata, “Verification of Interpretability of Phase-Resolved Partial Discharge Using a CNN With SHAP,” IEEE Access, vol. 11, pp. 4752–4762, 2023, doi: 10.1109/ACCESS.2023.3236315.
S. M. Lundberg, G. G. Erion, and S.-I. Lee, “Consistent Individualized Feature Attribution for Tree Ensembles,” 2018, doi: 10.48550/ARXIV.1802.03888.
Y. Arslan et al., “Towards Refined Classifications Driven by SHAP Explanations,” in Machine Learning and Knowledge Extraction, vol. 13480, A. Holzinger, P. Kieseberg, A. M. Tjoa, and E. Weippl, Eds., in Lecture Notes in Computer Science, vol. 13480. , Cham: Springer International Publishing, 2022, pp. 68–81. doi: 10.1007/978-3-031-14463-9_5.
E. G. Lakatta and D. Levy, “Arterial and Cardiac Aging: Major Shareholders in Cardiovascular Disease Enterprises: Part II: The Aging Heart in Health: Links to Heart Disease,” Circulation, vol. 107, no. 2, pp. 346–354, Jan. 2003, doi: 10.1161/01.CIR.0000048893.62841.F7.
N. A. M. Zaini and M. K. Awang, “Hybrid Feature Selection Algorithm and Ensemble Stacking for Heart Disease Prediction,” IJACSA, vol. 14, no. 2, 2023, doi: 10.14569/IJACSA.2023.0140220.
J. B. Kostis, A. E. Moreyra, M. T. Amendo, J. Di Pietro, N. Cosgrove, and P. T. Kuo, “The effect of age on heart rate in subjects free of heart disease. Studies by ambulatory electrocardiography and maximal exercise stress test.,” Circulation, vol. 65, no. 1, pp. 141–145, Jan. 1982, doi: 10.1161/01.CIR.65.1.141.
D. Jacobs et al., “Report of the Conference on Low Blood Cholesterol: Mortality Associations.,” Circulation, vol. 86, no. 3, pp. 1046–1060, Sep. 1992, doi: 10.1161/01.CIR.86.3.1046.
M. Hedayatnia et al., “Dyslipidemia and cardiovascular disease risk among the MASHAD study population,” Lipids Health Dis, vol. 19, no. 1, p. 42, Dec. 2020, doi: 10.1186/s12944-020-01204-y.
Article Metrics
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
View My Stats1







