Perbaikan Prediksi Kesalahan Perangkat Lunak Menggunakan Seleksi Fitur dan Cluster-Based Classification
Abstract
High balance value of software fault prediction can help in conducting test effort, saving test costs, saving test resources, and improving software quality. Balance values in software fault prediction need to be considered, as in most cases, the class distribution of true and false in the software fault data set tends to be unbalanced. The balance value is obtained from trade-off between probability detection (pd) and probability false alarm (pf). Previous researchers had proposed Cluster-Based Classification (CBC) method which was integrated with Entropy-Based Discretization (EBD). However, predictive models with irrelevant and redundant features in data sets can decrease balance value. This study proposes improvement of software fault prediction outcomes on CBC by integrating feature selection methods. Some feature selection methods are integrated with CBC, i.e. Information Gain (IG), Gain Ration (GR), One-R (OR), Relief-F (RFF), and Symmetric Uncertainty (SU). The result shows that combination of CBC with IG gives best average balance value, compared to other feature selection methods used in this research. Using five NASA public MDP data sets, the combination of IG and CBC generates 63.91% average of balance, while CBC method without feature selection produce 54.79% average of balance. It shows that IG can increase CBC balance average by 9.12%.
References
B. Pudjoatmodjo and M. Hendayun, ―Kehandalan Software
Berdasarkan Data Sekunder Menggunakan Distribusi Poisson dan
Kualifikasi Cronbach ’ s Alpha,‖ J. Nas. Tek. Elektro dan Teknol. Inf., vol. 5, no. 2, 2016.
P. Singh and S. Verma, ―An Efficient Software Fault Prediction Model using Cluster based Classification,‖ Int. J. Appl. Inf. Syst., vol. 7, no. 3, pp. 35–41, 2014.
T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, ―A Systematic Review of Fault Prediction Performance in Software Engineering,‖ IEEE Trans. Softw. Eng., vol. 38, no. 6, pp. 1276–1304, 2012.
T. Menzies, J. Greenwald, and A. Frank, ―Data Mining Static Code Attributes to Learn Defect Predictors,‖ IEEE Trans. Softw. Eng., vol. 33, no. 1, pp. 2–14, 2007.
D. A. A. G. Singh, A. E. Fernando, and E. J. Leavline, ―Experimental study on feature selection methods for software fault detection,‖ Int. Conf. Circuit, Power Comput. Technol., pp. 1–6, 2016.
R. Abraham, J. B. Simha, and S. S. Iyengar, ―Effective Discretization and Hybrid feature selection using Naïve Bayesian classifier for Medical datamining,‖ Int. J. Comput. Intell. Res., vol. 5, no. 2, pp. 116–129, 2009.
K. Gao, T. M. Khoshgoftaar, H. Wang, and N. Seliya, ―Choosing software metrics for defect prediction: an investigation on feature selection techniques,‖ Softw. - Pract. Exp., vol. 39, no. 7, pp. 701–736, 2011.
J. R. Quinlan, ―Induction of Decision Trees,‖ Mach. Learn., vol. 1, no. 1, pp. 81–106, 1986.
G. Holmes and C. G. Nevill-Manning, ―Feature Selection Via The Discovery Of Simple Classification Rules,‖ Work. Pap. 95/10, vol. ISSN 1170-, pp. 1–5, 1995.
J. Novakovic, ―The Impact of Feature Selection on the Accuracy of Naive Bayes Classifier,‖ 18th Telecommun. forum TELFOR, vol. 2, pp. 1113–1116, 2010.
F. Yang, W. Cheng, R. Dou, and N. Zhou, ―An Improved Feature Selection Approach Based on ReliefF and Mutual Information,‖ Int. Conf. Inf. Sci. Technol., pp. 246–250, 2011.
P. Singh and S. Verma, ―Cross Project Software Fault Prediction at Design Phase,‖ Int. J. Comput. Electr. Autom. Control Inf. Eng., vol. 9, no. 3, pp. 800–805, 2015.
R. Sathyaraj and S. Prabu, ―An Approach for Software Fault Prediction to Measure the Quality of Different Prediction Methodologies using Software Metrics,‖ Indian J. Sci. Technol., vol. 8, no. December, 2015.
P. Singh and O. P. Vyas, ―Software Fault Prediction Model for Embedded Software : A Novel finding,‖ Int. J. Comput. Sci. Inf. Technol., vol. 5, no. 2, pp. 2348–2354, 2014.
J. Dougherty, R. Kohavi, and M. Sahami, ―Supervised and unsupervised discretization of continuous features,‖ Proc. 12th Int. Conf. Mach. Learn., pp. 194–202, 1995.
U. M. Fayyad and K. B. Irani, ―Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning.‖ IJCAI, pp. 1022-1029, 1993.
C. Akalya Devi, K. E. Kannammal, and B. Surendiran, ―A Hybrid Feature Selection Model For Software Fault Prediction,‖ Int. J. Comput. Sci. Appl., vol. 2, no. 2, pp. 25–35, 2012.
D. T. Hidayat, C. Fatichah, and R. V. H. Ginardi, ―Pengelompokan Data Menggunakan Pattern Reduction Enhanced Ant Colony Optimization dan Kernel Clustering,‖ J. Nas. Tek. Elektro dan Teknol. Inf., vol. 5, no. 3, pp. 1–6, 2016.
D. H. Murti, N. Suciati, and D. J. Nanjaya, ―Clustering data nonnumerik dengan pendekatan algoritma k-means dan hamming distance studi kasus biro jodoh,‖ J. Ilm. Teknol. Inf., vol. 4, pp. 46–53, 2005.
© Jurnal Nasional Teknik Elektro dan Teknologi Informasi, under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License.