Backward Elimination for Feature Selection on Breast Cancer Classification Using Logistic Regression and Support Vector Machine Algorithms
Salsha Farahdiba(1*), Dwi Kartini(2), Radityo Adi Nugroho(3), Rudy Herteno(4), Triando Hamonangan Saragih(5)
(1) Lambung Mangkurat University
(2) Lambung Mangkurat University
(3) Lambung Mangkurat University
(4) Lambung Mangkurat University
(5) Lambung Mangkurat University
(*) Corresponding Author
Abstract
Breast cancer is a prevalent form of cancer that afflicts women across all nations globally. One of the ways that can be done as a prevention to reduce elevated fatality due to breast cancer is with a detection system that can determine whether a cancer is benign or malignant. Logistic Regression and Support Vector Machine (SVM) classification algorithms are often used to detect this disease, but the use of these two algorithms often doesn’t give optimal results when applied to datasets that have many features, so additional algorithm is needed to improve classification performance by using Backward Elimination feature selection. The comparison of Logistic Regression and SVM algorithms was carried out by applying feature selection to breast cancer data to see the best model. The breast cancer dataset has 30 features and two classes, Benign and Malignant. Backward Elimination has reduced features from 30 features to 13 features, thereby increasing the performance of both classification models. The best classification was obtained by using the Backward Elimination feature selection and linear kernel SVM with an increase in accuracy value from 96.14% to 97.02%, precision from 98.06% to 99.49%, recall from 90.48% to 92.38%, and the AUC from 0.95 to 0.96.
Keywords
Full Text:
PDFReferences
S. Łukasiewicz, M. Czeczelewski, A. Forma, J. Baj, R. Sitarz, and A. Stanisławek, “Breast cancer—epidemiology, risk factors, classification, prognostic markers, and current treatment strategies—An updated review,” Cancers, vol. 13, no. 17. MDPI, Sep. 01, 2021. doi: 10.3390/cancers13174287.
A. Al Bataineh, “A comparative analysis of nonlinear machine learning algorithms for breast cancer detection,” Int J Mach Learn Comput, vol. 9, no. 3, pp. 248–254, Jun. 2019, doi: 10.18178/ijmlc.2019.9.3.794.
T. Azhima Yoga Siswa, “Perbandingan Kinerja Algoritma C4.5, Naïve Bayes, K-Nearest Neighbor, Logistic Regression, dan Support Vector Machines Untuk Mendeteksi Penyakit Kanker Payudara,” 2018. [Online]. Available: http://archive.ics.uci.edu.
H. Chen, N. Wang, X. Du, K. Mei, Y. Zhou, and G. Cai, “Classification Prediction of Breast Cancer Based on Machine Learning,” Comput Intell Neurosci, vol. 2023, pp. 1–9, Jan. 2023, doi: 10.1155/2023/6530719.
L. Indah Prahartiwi, W. Dari, and S. Nusa Mandiri, “Komparasi Algoritma Naive Bayes, Decision Tree dan Support Vector Machine untuk Prediksi Penyakit Kanker Payudara,” Jurnal Teknik Komputer AMIK BSI, vol. 7, no. 1, 2021, doi: 10.31294/jtk.v4i2.
E. Ing, W. Su, M. Schonlau, and N. Torun, “Support Vector Machines and logistic regression to predict temporal artery biopsy outcomes,” Canadian Journal of Ophthalmology, vol. 54, no. 1, pp. 116–118, Feb. 2019, doi: 10.1016/j.jcjo.2018.05.006.
D. PEMBIMBING Ir Joko Lianto Buliali and D. Manajemen Teknologi Bidang Keahlian Manajemen Teknologi Informasi Fakultas Bisnis Dan Manajemen Teknologi, “Tesis-Pm 147501 Prediksi Kinerja Mahasiswa Menggunakan Support Vector Machine Untuk Pengelola Program Studi di Perguruan Tinggi (Studi Kasus: Program Studi Magister Statistika ITS) Fathin Hilmiyah 9115 205 311,” 2017.
R. Zebari, A. Abdulazeez, D. Zeebaree, D. Zebari, and J. Saeed, “A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction,” Journal of Applied Science and Technology Trends, vol. 1, no. 2, pp. 56–70, May 2020, doi: 10.38094/jastt1224.
K. Dissanayake and M. G. M. Johar, “Comparative study on heart disease prediction using feature selection techniques on classification algorithms,” Applied Computational Intelligence and Soft Computing, vol. 2021, 2021, doi: 10.1155/2021/5581806.
R. Resmiati and T. Arifin, “SISTEMASI: Jurnal Sistem Informasi Klasifikasi Pasien Kanker Payudara Menggunakan Metode Support Vector Machine dengan Backward Elimination.” [Online]. Available: http://sistemasi.ftik.unisi.ac.id
R. Sari Wardani, “Model Pengambilan Keputusan dalam Prediksi Kasus Tuberkulosis Menggunakan Regresi Logistik Berbasis Backward Elimination,” 2014. Accessed: Sep. 05, 2023. [Online]. Available: https://jurnal.unimus.ac.id/index.php/psn12012010/article/view/1226
M. Tech, “Breast Cancer Remnant Impact During Covid-19 Using to Machine Learning DEVANAND,” Journal of Tianjin University Science and Technology, vol. 55, no. 01, pp. 149–159, 2022, doi: 10.17605/OSF.IO/68GZU.
M. Samieinasab, S. A. Torabzadeh, A. Behnam, A. Aghsami, and F. Jolai, “Meta-Health Stack: A new approach for breast cancer prediction,” Healthcare Analytics, vol. 2, Nov. 2022, doi: 10.1016/j.health.2021.100010.
K. Maharana, S. Mondal, and B. Nemade, “A review: Data pre-processing and data augmentation techniques,” Global Transitions Proceedings, vol. 3, no. 1, pp. 91–99, Jun. 2022, doi: 10.1016/j.gltp.2022.04.020.
S. Roy, P. Sharma, K. Nath, D. K. Bhattacharyya, and J. K. Kalita, “Pre-processing: A data preparation step,” in Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics, Elsevier, 2018, pp. 463–471. doi: 10.1016/B978-0-12-809633-8.20457-3.
P. Meilina, “Penerapan Data Mining dengan Metode Klasifikasi Menggunakan Decision Tree dan Regresi,” Jakarta, Jan. 2015.
S. Muthukumaran, P. Geetha, and E. Ramaraj, “A Rule Based Recommender System to Improve the Yield of Groundnut Crop Using Decision Tree with Backward Elimination, Principal Component Analysis,” 2021.
F. Maulidina, Z. Rustam, S. Hartini, V. V. P. Wibowo, I. Wirasati, and W. Sadewo, “Feature optimization using Backward Elimination and Support Vector Machines (SVM) algorithm for diabetes classification,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Mar. 2021. doi: 10.1088/1742-6596/1821/1/012006.
C. A. Ramezan, T. A. Warner, and A. E. Maxwell, “Evaluation of sampling and cross-validation tuning strategies for regional-scale machine learning classification,” Remote Sens (Basel), vol. 11, no. 2, Jan. 2019, doi: 10.3390/rs11020185.
S. Prusty, S. Patnaik, and S. K. Dash, “SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer,” Frontiers in Nanotechnology, vol. 4, Aug. 2022, doi: 10.3389/fnano.2022.972421.
D. Kartikasari, “Analysis of Factors Affecting Air Pollution Levels Using The Binary Logistic Regression Method,” 2020. doi: https://doi.org/10.26740/mathunesa.v8n1.p55-59.
A. S. Arsya, “Comparison of Oversampling Methods On Balanced Data Using Logistic Regression Algorithm In Stroke Disease Classification,” 2023.
S. Abdollahi, H. R. Pourghasemi, G. A. Ghanbarian, and R. Safaeian, “Prioritization of effective factors in the occurrence of land subsidence and its susceptibility mapping using an SVM model and their different kernel functions,” Bulletin of Engineering Geology and the Environment, vol. 78, no. 6, pp. 4017–4034, Sep. 2019, doi: 10.1007/s10064-018-1403-6.
M. H. Memon, J. P. Li, A. U. Haq, M. H. Memon, W. Zhou, and R. Lacuesta, “Breast Cancer Detection in the IOT Health Environment Using Modified Recursive Feature Selection,” Wirel Commun Mob Comput, vol. 2019, 2019, doi: 10.1155/2019/5176705.
A. P. Lahagu, “Implementasi Data Mining Untuk Memprediksi Pemesanan Barang Ekspor Pada PT. Musim Mas Dengan Menggunakan Metode Support Vector Machine (Study Kasus : PT. Musim Mas),” Pelita Informatika : Informasi dan Informatika, vol. 9, no. 1, 2020.
M. Hasnain, M. F. Pasha, I. Ghani, M. Imran, M. Y. Alzahrani, and R. Budiarto, “Evaluating Trust Prediction and Confusion Matrix Measures for Web Services Ranking,” IEEE Access, vol. 8, pp. 90847–90861, 2020, doi: 10.1109/ACCESS.2020.2994222.
R. Rajalakshmi and C. Aravindan, “A Naive Bayes approach for URL classification with supervised feature selection and rejection framework,” Comput Intell, vol. 34, no. 1, pp. 363–396, Feb. 2018, doi: 10.1111/coin.12158.
A. Luque, A. Carrasco, A. Martín, and A. de las Heras, “The impact of class imbalance in classification performance metrics based on the binary confusion matrix,” Pattern Recognit, vol. 91, pp. 216–231, Jul. 2019, doi: 10.1016/j.patcog.2019.02.023.
Z. Yang, Q. Xu, S. Bao, Y. He, X. Cao, and Q. Huang, “When All We Need is a Piece of the Pie: A Generic Framework for Optimizing Two-way Partial AUC,” 2021.
F. Gorunescu, Data Mining: Concepts, Models and Techniques. in Intelligent Systems Reference Library. Springer Berlin Heidelberg, 2011. [Online]. Available: https://books.google.co.id/books?id=yJvKY-sB6zkC
D. Rodriguez, I. Herraiz, R. Harrison, J. Dolado, and J. C. Riquelme, “Preliminary comparison of techniques for dealing with imbalance in software defect prediction,” in ACM International Conference Proceeding Series, Association for Computing Machinery, 2014. doi: 10.1145/2601248.2601294.
S. Lonang and D. Normawati, “Klasifikasi Status Stunting Pada Balita Menggunakan K-Nearest Neighbor Dengan Feature Selection Backward Elimination,” Jurnal Media Informatika Budidarma, vol. 6, no. 1, p. 49, Jan. 2022, doi: 10.30865/mib.v6i1.3312.
P. P. Sengar, M. J. Gaikwad, and A. S. Nagdive, “Comparative study of machine learning algorithms for breast cancer prediction,” in Proceedings of the 3rd International Conference on Smart Systems and Inventive Technology, ICSSIT 2020, Institute of Electrical and Electronics Engineers Inc., Aug. 2020, pp. 796–801. doi: 10.1109/ICSSIT48917.2020.9214267.
DOI: https://doi.org/10.22146/ijccs.88926
Article Metrics
Abstract views : 1991 | views : 1325Refbacks
- There are currently no refbacks.
Copyright (c) 2023 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
View My Stats1