Oversampling Method To Handling Imbalanced Datasets Problem In Binary Logistic Regression Algorithm


Windyaning Ustyannie(1*), Suprapto Suprapto(2)

(1) Prodi S2 Ilmu Komputer; FMIPA UGM, Yogyakarta
(2) Departemen Ilmu Komputer and Elektronika, FMIPA UGM, Yogyakarta
(*) Corresponding Author


The class imbalance is a condition when one class has a higher percentage than the other then it can affect the accuracy. One method in data mining that can be used to classification is logistic regression method. The method used in this research is RWO-sampling method using random replicate approach for synthetic data generation on descrete attribute. The result of the research can handle the problem of class imbalance, RWO-sampling method with random replicate approach shows better accuracy than RWO-sampling method with roulette and ROS approach. The accuracy value for RWO-Sampling method with roulette and RWO-Sampling approach with random replicate approach has increased to an average of 15.55% of each dataset. As for comparithem with the ROS method has increased an average of 3.7% of each dataset. Furthermore, for testing the underfitting problem in logistic regression, the oversampling method is better than non-oversampling with an increase in accuracy value reaching an average of 2.3% of each dataset.


Imbalanced Datasets; RWO-Sampling; Logistic Regression

Full Text:



[1] H. He and Y. Ma, Imbalanced Learning: Foundations, Algorithms, and Applications, pp. 101-149, John Wiley & Thenns, New Jersey, 2014.

[2] J. A. Sáez, J. Luengo, J. Stefanowski and F. Herrera, SMOTE – IPF : Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering, pp. 184–203, https://doi.org/10.1016/j.ins.2014.08.051, 2015.

[3] I. H. Witten, F. Eibe, and M. A. Hall, Data Mining : Practical Machine Learning Tools and Techniques, 3rd Edition, Elsevier, United States, 2011.

[4] P. Harrington, Machine Learning in Action, Manning Publications Co, 2012.

[5] B. W. Yap, K. A. Rani, H. A. A. Rahman, S. Fong, Z. Khairudin, and N. N. Abdullah, An Application of Oversampling, Undersampling,Bagging and Boosting in Handling Imbalanced Datasets, (eds) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013), Lecture Notes in Electrical Engineering, vol. 285, Springer, Singapore, 2014.

[6] H. Zhang and M. Li, RWO-Sampling: A Random Walk Over-sampling Approach to Imbalanced Data Classification, Information Fusion, vol. 20(1), pp. 99–116, 2014.

[7] Y. Qian, Y. Liang, M. Li, G. Feng, and X. Shi, A Resampling Ensemble Algorithm for Classification of Imbalance Problems, Neurocomputing, vol. 143, pp. 57–67, http://doi.org/10.1016/j.neucom.2014.06.021, 2014.

[8] J. F. Díez-Pastor, J. J. Rodríguez, C. García-Othenrio, and L. I. Kuncheva, Random Balance: Ensembles of Variable Priors Classifiers for Imbalanced Data, Knowledge Based Systems, vol. 85, pp. 96–111, 2015.

[9] H. L. Dai, Class Imbalance Learning Via a Fuzzy Total Margin Based Support Vector Machine, Applied Thenft Computing, vol. 31, pp. 172–184, 2015.

[10] Q. Fan, Z. Wang, and D. Gao, One-sided Dynamic Undersampling No-Propagation Neural Networks for imbalance problem, Engineering Applications of ArtificialIntelligence, vol. 53, pp. 62–73, 2016.

DOI: https://doi.org/10.22146/ijccs.37415

Article Metrics

Abstract views : 6704 | views : 3699


  • There are currently no refbacks.

Copyright (c) 2020 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Copyright of :
IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
ISSN 1978-1520 (print); ISSN 2460-7258 (online)
is a scientific journal the results of Computing
and Cybernetics Systems
A publication of IndoCEISS.
Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281
Fax: +62274 555133
email:ijccs.mipa@ugm.ac.id | http://jurnal.ugm.ac.id/ijccs

View My Stats1
View My Stats2