Interpretable Machine Learning for Job Placement Prediction: A SHAP-Based Feature Analysis

Swono Sibagariang

doi:10.22146/jnteti.v14i3.20516

Swono Sibagariang Batam State Polytechnic, Kota Batam, Kepulauan Riau 2946, Indonesia

DOI: https://doi.org/10.22146/jnteti.v14i3.20516

Keywords: Machine Learning, Job Suitability Prediction, Shapley Additive Explanations (SHAP), Graduate Job Placement

Abstract

Predictive modeling is important in analyzing graduates’ job outcomes, especially in forecasting job placements based on academic performance and courses. This study aims to improve predictive accuracy and interpretability in job placement classification using advanced machine learning models and SHapley Additive exPlanations (SHAP) analysis. Utilizing a dataset containing graduates’ academic records, including course grades, grade point average (GPA), and internship duration, this research employed several classification models, including decision tree, random forest, extreme gradient boosting (XGBoost), light gradient-boosting machine (LightGBM), CatBoost, and logistic regression. Evaluation metrics showed that most models achieve 92% precision, 92% recall, and 92% F1 score, with an accuracy of 85%, while logistic regression excelled with 100% recall, 96% F1 score, and 92% accuracy. SHAP analysis identified key features such as Administration, Computer Organization, Information Systems, Entrepreneurship, Professional Ethics, and Web Programming as the most influential in predicting job placement. Other significant contributors include Introduction to Information Technology, Software Engineering II, and Data Mining, although with relatively lower influence. Extracurricular activities and internship experiences were also found to be influential factors, highlighting the importance of academic and nonacademic elements in shaping graduates’ career prospects. These findings highlight and emphasize the need to provide students with certain academic courses to better prepare them for the job market. These findings emphasize the importance of interpretable machine learning models in career forecasting, enabling educational institutions to optimize curriculum design and enhance graduates’ employability. Future research should explore feature selection techniques, temporal analysis, and personalized recommendation systems to refine predictive accuracy.

References

World Economic Forum, “The Future of Jobs Report,” 2023. [Online]. Available: https://www.weforum.org/reports/the-future-of-jobs-report-2023

M.H. Baffa, M.A. Miyim, and A.S. Dauda, “Machine learning for predicting students’ employability,” UMYU Sci., vol. 2, no. 1, pp. 001–009, Mar. 2023, doi: 10.56919/usci.2123_001.

A.A. binti Kahlik dan A.Y.S. Al-Hababi, “Predicting post-internship employability using ensemble machine learning approach,” J. Cogn. Sci. Hum. Dev., vol. 10, no. 2, pp. 87–101, Sep. 2024, doi: 10.33736/jcshd.7518.2024.

S. Ramos-Pulido, N. Hernández-Gress, and G. Torres-Delgado, “Analysis of soft skills and job level with data science: A case for graduates of a private university,” Informatics, vol. 10, no. 1, pp. 1–13, Mar. 2023, doi: 10.3390/informatics10010023.

H. Sahlaoui et al., “Predicting and interpreting student performance using ensemble models and Shapley additive explanations,” IEEE Access, vol. 9, pp. 152688–152703, Oct. 2021, doi: 10.1109/ACCESS.2021.3124270.

S. Ramos-Pulido, N. Hernández-Gress, and G. Torres-Delgado, “Exploring the relationship between career satisfaction and university learning using data science models,” Informatics, vol. 11, no. 1, pp. 1–18, Mar. 2024, doi: 10.3390/informatics11010006.

Y. Aswini, J. Jersha, S.B. Chakravarthi, and S. Aditiya B., “Predicting student placement outcomes using machine learning techniques,” Int. J. Nov. Trends Innov. (IJNTI), vol. 2, no. 10, pp. 63–69, Oct. 2024.

M.K. Shukla et al., “Students placement prediction model using logistic regression,” in Int. Conf. Innov. Adv. Technol. Eng., 2017, pp. 1–4.

M. Kumar et al., “Predicting college students’ placements based on academic performance using machine learning approaches,” Int. J. Mod. Educ. Comput. Sci. (IJMECS), vol. 15, no. 6, pp. 1–13, Dec. 2023, doi: 10.5815/ijmecs.2023.06.01.

C. Patro and I. Pan, “Decision tree-based classification model to predict student employability,” in Proc. Res. Appl. Artif. Intell., 2021, pp. 327–333, doi: 10.1007/978-981-16-1543-6_32.

H.Q. Nguyen et al., “Career path prediction using XGBoost model and students’ academic results”, CTU J. Innov. Sustain. Dev., vol. 15, no. Special issue: ISDS, pp. 62–75, Oct. 2023, doi: 10.22144/ctujoisd.2023.036

D. Mhamdi et al., “Job recommendation based on recurrent neural network approach,” Procedia Comput. Sci., vol. 220, pp. 1039–1043, Mar. 2023, doi: 10.1016/j.procs.2023.03.145.

X. Xue et al., “Convolutional recurrent neural networks with a self-attention mechanism for personnel performance prediction,” Entropy, vol. 21, no. 12, p. 1–16, Dec. 2019, doi: 10.3390/e21121227.

M. Abdelaal, C. Hammacher, and H. Schöning, “REIN: A comprehensive benchmark framework for data cleaning methods in ML pipelines,” in Proc. 26th Int. Conf. Extending Database Technol. (EDBT 2023), 2023, pp. 499–511.

R. LaRose and B. Coyle, “Robust data encodings for quantum classifiers,” 2020, arXiv:2003.01695.

K. Zhang et al., “Description-enhanced label embedding contrastive learning for text classification,” IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 10, pp. 14889–14902, Oct. 2024, doi: 10.1109/TNNLS.2023.3282020.

M.M. Suarez-Alvarez, D.-T. Pham, M.Y. Prostov, and Y.I. Prostov, “Statistical approach to normalization of feature vectors and clustering of mixed datasets,” in Proc. R. Soc. A, 2012, pp. 2630–2651, doi: 10.1098/rspa.2011.0704.

Q.-M. Tan, “Normalization in mathematical simulations,” in Dimensional Analysis, Heidelberg, Germany: Springer, 2011, pp. 161–179.

V.L. Miguéis, A. Freitas, P.J.V. Garcia, and A. Silva, “Early segmentation of students according to their academic performance: A predictive modelling approach,” Decis. Support Syst., vol. 115, pp. 36-51, Nov. 2018, doi: 10.1016/j.dss.2018.09.001.

L. Breiman, “Random forests,” Machine Learning, vol. 45, pp. 5–32, Oct. 2001, doi: 10.1023/A:1010933404324.

T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in KDD '16: Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2016, pp. 785–794, doi: 10.1145/2939672.2939785.

G. Ke et al., “LightGBM: A highly efficient gradient boosting decision tree,” in NIPS'17: Proc. 31st Int. Conf. Neural Inf. Process. Syst., 2017, pp. 3149 – 3157.

L. Prokhorenkova et al., “CatBoost: Unbiased boosting with categorical features,” in NIPS'18: Proc. 32nd Int. Conf. Neural Inf. Process. Syst., 2017, pp. 6639 – 6649.

A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,” Front. Neurorobotics, vol. 7, pp. 1–21, Dec. 2013, doi: 10.3389/fnbot.2013.00021.

S. Sperandei, “Understanding logistic regression analysis,” Biochem. Med., vol. 24, no. 1, pp. 12–18, Feb. 2014, doi: 10.11613/BM.2014.003.

A.B. Parsa et al., “Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis,” Accid. Anal. Prev., vol. 136, pp. 1–8, Mar. 2020, doi: 10.1016/j.aap.2019.105405.

K.K.P.M. Kannangara, W.-H. Zhou, Z. Ding, and Z. Hong, “Investigation of feature contribution to shield tunneling-induced settlement using Shapley additive explanations method,” J. Rock Mech. Geotech. Eng., vol. 14, no. 4, pp. 1052–1063, Aug. 2022, doi: 10.1016/j.jrmge.2022.01.002.

M.T. Syamkalla, S. Khomsah, dan Y.S.R. Nur, “Implementasi algoritma CatBoost dan Shapley additive explanations (SHAP) dalam memprediksi popularitas game indie pada platform Steam,” J. Teknol. Inf. Ilmu Komput., vol. 11, no. 4, pp. 777–786, Aug. 2024, doi: 10.25126/jtiik.1148503.

H. Kamel and M.Z. Abdullah, “Distributed denial of service attacks detection for software defined networks based on evolutionary decision tree model,” Bul. Tek. Elekt. Inform.vol. 11, no. 4, pp. 2322–2330, Aug. 2022, doi: 10.11591/eei.v11i4.3835.

M.K.M. Almansoori and M. Telek, “Anomaly detection using combination of autoencoder and isolation forest,” in 1st Workshop Intell. Infocommunication Netw. Syst. Serv. (WI2NS2), 2023, pp. 25–30, doi: 10.3311/wins2023-005.