ROS, SMOTE, SMOTE-ENN COMPARISON USING GNB and Adaboost Classifiers for Cervical Cancer Imbalanced Dataset

https://doi.org/10.22146/teknosains.111431

Evvin Faristasari(1*), Sirlus Andreanto Jasman Duli(2), Indri Dwi Agustin(3), Yuda Paraswistara(4), Bradika Almandin Wisesa(5), Vivin Mahat Putri(6)

(1) Politeknik Manufaktur Negeri Bangka Belitung
(2) Politeknik Manufaktur Negeri Bangka Belitung
(3) Politeknik Manufaktur Negeri Bangka Belitung
(4) Politeknik Manufaktur Negeri Bangka Belitung
(5) Politeknik Manufaktur Negeri Bangka Belitung
(6) Politeknik Manufaktur Negeri Bangka Belitung
(*) Corresponding Author

Abstract


Cervical cancer continues to pose a significant health risk to women, especially when diagnosis occurs at a later stage. Early screening therefore plays an important role in reducing disease progression while increasing the possibility of successful treatment. In recent years, machine learning has been increasingly applied to support disease identification through data classification approaches. This study was conducted to compare the performance of classification models on a cervical cancer dataset by applying three resampling techniques, namely Random Over Sampling (ROS), Synthetic Minority Over-sampling Technique (SMOTE), and SMOTE-ENN, to handle data imbalance. The dataset was obtained from an opensource dataset and underwent several preprocessing stages, including the division of training and testing data, missing value examination, and imputation for incomplete records. Afterward, class distribution was analyzed to confirm the imbalance condition before the resampling process was applied. ROS was implemented by duplicating minority class instances, SMOTE generated synthetic samples through interpolation, while SMOTE-ENN combined oversampling with data cleaning. All experimental scenarios were then evaluated using Gaussian Naive Bayes and AdaBoost Classifier. The findings indicate that Gaussian Naive Bayes combined with ROS produced better recall performance than AdaBoost. This suggests that Gaussian Naive Bayes demonstrates higher sensitivity in identifying positive cases, particularly after minority class representation is improved. The results also emphasize that the evaluation of machine learning models, especially in medical applications, should not rely solely on accuracy but also consider precision and recall obtaining more reliable classification outcomes.


Keywords


Cervical cancer; Imbalanced data; Resampling; disease; Gaussian Naive Bayes

Full Text:

PDF


References

Amani, R. Z., Maulana, R., & Syauqy, D. (2017). Sistem Pendeteksi Dehidrasi Berdasarkan Warna Dan Kadar Amonia Pada Urin Berbasis Sensor Tcs3200 Dan Mq135 Dengan Metode Naive Bayes. Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 1(5), 436–444. http://j-ptiik.ub.ac.id/Index.php/j-ptiik/article/view/137

Cataldi, L., Tiberi, L., & Costa, G. (2021). Estimation of MCS intensity for Italy from high quality accelerometric data, using GMICEs and Gaussian Naive Bayes Classifiers. Bulletin of Earthquake Engineering, 19(6), 2325–2342. https://doi.org/10.1007/s10518-021-01064-6

Dewi, D. A. (2025). A Gaussian Naive Bayes and SMOTE-Based Approach for Predicting Breast Cancer Aggressiveness in Imbalanced Datasets. IJIIS: International Journal of Informatics and Information Systems, 8(1), 44–54. https://doi.org/10.47738/ijiis.v8i1.250

Faristasari, E., Ardiyanto, I., & Ganap, E. P. (2023). HMIS Health Management and Information Science Classification of Maternal Emergencies Using Gaussian Naive Bayes to Speed up the Patient’s Triage Process. https://doi.org/10.30476/JHMI.2024.100872.1192

Fernandes, K., Cardoso, J. S., & Fernandes, J. (2017). Transfer Learning with Partial Observability Applied to Cervical Cancer Screening. In J. S. Cardoso, H. P. Oliveira, & A. M. Rebelo (Eds.), Pattern Recognition and Image Analysis (Pp. 243–250). Springer. Https://doi.org/10.1007/978-3-319-58838-4_27

Hang, H., Cai, Y., Yang, H., & Lin, Z. (2022). Under-Bagging Nearest Neighbors for Imbalanced Classification. Journal of Machine Learning Research, 23(229), 1–39. http://jmlr.org/papers/v23/21-0904.html

Koul, N., & Manvi, S. S. (2021). Cancer Classification using Ensemble Feature Selection and Random Forest Classifier. IOP Conference Series: Materials Science and Engineering, 1074(1), 012004. https://doi.org/10.1088/1757-899x/1074/1/012004

Kumar, P., Bhatnagar, R., Gaur, K., & Bhatnagar, A. (2021). Classification of Imbalanced Data:Review of Methods and Applications. IOP Conference Series: Materials Science and Engineering, 1099(1), 012077. https://doi.org/10.1088/1757-899x/1099/1/012077

Lamari, M., Azizi, N., Hammami, N. E., Boukhamla, A., Cheriguene, S., Dendani, N., & Benzebouchi, N. E. (2021a). SMOTE–ENN-Based Data Sampling and Improved Dynamic Ensemble Selection for Imbalanced Medical Data Classification. Advances in Intelligent Systems and Computing, 1188, 37–49. https://doi.org/10.1007/978-981-15-6048-4_4

Lamari, M., Azizi, N., Hammami, N. E., Boukhamla, A., Cheriguene, S., Dendani, N., & Benzebouchi, N. E. (2021b). SMOTE–ENN-Based Data Sampling and Improved Dynamic Ensemble Selection for Imbalanced Medical Data Classification. Advances in Intelligent Systems and Computing, 1188, 37–49. https://doi.org/10.1007/978-981-15-6048-4_4

Martínez Logreira, J. A., & Arbeláez Escalante, P. A. (2020). Machine Learning-Based Cancer Classification Using Gene Expression Data. Undergraduate Thesis. Universidad De Los Andes. Repositorio Institucional Séneca. http://hdl.handle.net/1992/48443

Puspitaningtyas, H., Kusuma, D. N., Mardiyanti, K., Ashari, A., Kurniawati, L., Limbu, L., Sulistiyoni, H., & Harmani, H. (2019). Factors Influencing Adherence to Cryotherapy Following Positive Via Result as Cervical Cancer Prevention in Temanggung, Central Java, Indonesia. Annals Of Oncology, 30(Suppl_9), Ix31. https://doi.org/10.1093/annonc/mdz426

Ramadhan, M. A., Saragih, T. H., Kartini, D., Mazdadi, M. I., & Muliadi. (2026). A Comparative Analysis of SMOTE and ADASYN for Cervical Cancer Detection using XGBoost with MICE Imputation. Journal of Electronics, Electromedical Engineering, and Medical Informatics, 8(1), 368–394. https://doi.org/10.35882/jeeemi.v8i1.1415

Ryan, M., Marlow, L., & Waller, J. (2019). Socio-demographic correlates of cervical cancer risk factor knowledge among screening non-participants in Great Britain. Preventive Medicine, 125, 1–4. https://doi.org/10.1016/j.ypmed.2019.04.026

Setiawan, D., Dolk, F. C., Suwantika, A. A., Westra, T. A., Wildschut, J. C., & Postma, M. J. (2016). Cost-Utility Analysis of Human Papillomavirus Vaccination and Cervical Screening on Cervical Cancer Patient in Indonesia. Value in Health Regional Issues, 9, 84–92. https://doi.org/10.1016/j.vhri.2015.10.010

Tan, Q., Li, W., & Chen, X. (2021). Identification the source of fecal contamination for geographically unassociated samples with a statistical classification model based on support vector machine. Journal of Hazardous Materials, 407. https://doi.org/10.1016/j.jhazmat.2020.124821

Teame, H., Gebremariam, L., Kahsay, T., Berhe, K., Gebreheat, G., & Gebremariam, G. (2019). Factors affecting utilization of cervical cancer screening services among women attending public hospitals in Tigray region, Ethiopia, 2018; Case control study. PLoS ONE, 14(3). https://doi.org/10.1371/journal.pone.0213546

Xiao, A. S., & Liang, Q. (2024). Spam detection for Youtube video comments using machine learning approaches. Machine Learning with Applications, 16, 100550. https://doi.org/10.1016/j.mlwa.2024.100550

Yang, Q., Zhou, Q., He, X., Cai, J., Sun, S., Huang, B., & Wang, Z. (2021). Retrospective analysis of the incidence and predictive factors of parametrial involvement in FIGO IB1 cervical cancer. Journal of Gynecology Obstetrics and Human Reproduction, 50(8). https://doi.org/10.1016/j.jogoh.2021.102145

Yaqoob, A., Musheer Aziz, R., & Verma, N. K. (2023). Applications And Techniques of Machine Learning in Cancer Classification: A Systematic Review. Human-Centric Intelligent Systems, 3(4), 588–615. https://doi.org/10.1007/S44230-023-00041-3



DOI: https://doi.org/10.22146/teknosains.111431

Article Metrics

Abstract views : 166 | views : 72

Refbacks

  • There are currently no refbacks.




Copyright (c) 2026 Evvin Faristasari et al.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.



 Submit an Article        Tracking Your Submission


Editorial Policies       Publishing System       Copyright Notice       Site Map       Journal History      Visitor Statistics     Abstracting & Indexing