Enhancing Soil Liquefaction Prediction: Overcoming Data Challenges in SPT-Based Machine Learning with Imputation Technique

  • Fandi Fadliansyah Department of Civil and Environmental Engineering, Universitas Gadjah Mada, Yogyakarta, INDONESIA
  • Fikri Faris Department of Civil and Environmental Engineering, Universitas Gadjah Mada, Yogyakarta, INDONESIA and Center for Disaster Mitigation and Technological Innovation (GAMA-InaTEK), Universitas Gadjah Mada, Yogyakarta, INDONESIA
  • Wahyu Wilopo Department of Geological Engineering, Universitas Gadjah Mada, Yogyakarta, INDONESIA and Center for Disaster Mitigation and Technological Innovation (GAMA-InaTEK), Universitas Gadjah Mada, Yogyakarta, INDONESIA
  • Ardiansyah Department of Computer Science, Faculty of Mathematics and Natural Sciences, Universitas Lampung, Lampung, INDONESIA
Keywords: Machine learning, Missing value imputation, Soil liquefaction, Earthquake, Standard penetration test

Abstract

In addition to the adverse effects of earthquakes, the loss of soil-bearing capacity during liquefaction can exacerbate damage to buildings. Liquefaction phenomena involve many parameters, making it more complex to evaluate. Machine learning has been studied to deal with liquefaction complexity in recent decades. However, incomplete liquefaction data can result in missing information, complicating model development across various datasets. Therefore, this study aims to assess the capability of machine learning models to predict liquefaction by implementing the missing value imputation technique. Seismicity, soil properties, and soil condition parameters were utilized to develop models. Random Forest (RF), k-Nearest Neighbor (k-NN), and eXtreme Gradient Boosting (XGBoost) were trained by applying feature selection and parameter optimization based on standard penetration test (SPT) data. The confusion matrix was used to assess the performance of the model based on the performance matrix of Overall Accuracy (OA), Precision (Prec), Recall (Rec), F1-Score (F1), and Area Under the Curve (AUC). In addition, the preprocessing stage included data normalization and outlier treatment to enhance the reliability of model predictions, ensuring consistent learning behavior across different variable scales. The results show that the RF achieved the highest performance (OA = 90.71%), which is comparable to findings from other previous studies. The AUC results indicate that the models deliver excellent classification performance. These findings suggest that the integration of imputation and preprocessing techniques can significantly improve data-driven approaches in geotechnical earthquake engineering. In conclusion, the missing imputation is quite effective in the predictive model. Finally, this study offers a new perspective on developing machine learning models using a more user-friendly software and applying imputation techniques to handle missing data.

References

Acharya, A., Prakash, A., Saxena, P. and Nigam, A. (2013), ‘Sampling: why and how of it?’, Indian Journal of Medical Specialities 4(2). URL: https://doi.org/10.7713/ijms.2013.0032

Aggarwal, C. (2017), Outlier Analysis, Springer International Publishing, Cham. URL: https://doi.org/10.1007/978-3-319-47578-3

Aittokallio, T. (2010), ‘Dealing with missing values in large-scale studies: microarray data imputation and beyond’, Briefings in Bioinformatics 11(2), 253–264. URL: https://doi.org/10.1093/bib/bbp059

Boulanger, R. and Idriss, I. (2014), Cpt and spt based liquefaction triggering procedures, Technical Report Report No. UCD/CGM-14/01, Center for Geotechnical Modeling, University of California, Davis.

Breiman, L. (2001), ‘Random forests’, Machine Learning 45(1), 5–32. URL: https://doi.org/10.1023/A:1010933404324

Can, R., Kocaman, S. and Gokceoglu, C. (2021), ‘A comprehensive assessment of xgboost algorithm for landslide susceptibility mapping in the upper basin of ataturk dam, turkey’, Applied Sciences 11(11), 4993. URL: https://doi.org/10.3390/app11114993

Cetin, K., Seed, R., Kayen, R., Moss, R., Bilge, H., Ilgac, M. and Chowdhury, K. (2018), ‘Dataset on spt-based seismic soil liquefaction’, Data in Brief 20, 544–548. URL: https://doi.org/10.1016/j.dib.2018.08.043

Chen, T. and Guestrin, C. (2016), Xgboost: A scalable tree boosting system, in ‘Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16)’, ACM, San Francisco, California, USA, pp. 785–794. URL: https://doi.org/10.1145/2939672.2939785

Cunningham, P. and Delany, S. (2022), ‘k-nearest neighbour classifiers’, ACM Computing Surveys 54(6), 1–25. URL: https://doi.org/10.1145/3459665

Cutler, D. R., Edwards, T. C., Beard, K. H., Cutler, A., Hess, K. T., Gibson, J. and Lawler, J. J. (2007), ‘Random forests for classification in ecology’, Ecology 88(11), 2783–2792. URL: https://doi.org/10.1890/07-0539.1

Demir, S. and Sahin, E. K. (2022a), ‘Comparison of tree-based machine learning algorithms for predicting liquefaction potential using canonical correlation forest, rotation forest, and random forest based on cpt data’, Soil Dynamics and Earthquake Engineering 154, 107130. URL: https://doi.org/10.1016/j.soildyn.2021.107130

Demir, S. and Sahin, E. K. (2022b), ‘Liquefaction prediction with robust machine learning algorithms (svm, rf, and xgboost) supported by genetic algorithm-based feature selection and parameter optimization from the perspective of data processing’, Environmental Earth Sciences 81(18), 459. URL: https://doi.org/10.1007/s12665-022-10578-4

Demir, S. and Sahin, E. K. (2023), ‘An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using adaboost, gradient boosting, and xgboost’, Neural Computing and Applications 35(4), 3173–3190. URL: https://doi.org/10.1007/s00521-022-07856-4

Dhal, P. and Azad, C. (2022), ‘A comprehensive survey on feature selection in the various fields of machine learning’, Applied Intelligence 52(4), 4543–4581. URL: https://doi.org/10.1007/s10489-021-02550-9

Galupino, J. and Dungca, J. (2022), ‘Machine learning models to generate a subsurface soil profile: A case of makati city, philippines’, International Journal of GEOMATE 23(95). URL: https://doi.org/10.21660/2022.95.3372

Gandomi, A. H., Fridline, M. M. and Roke, D. A. (2013), ‘Decision tree approach for soil liquefaction assessment’, The Scientific World Journal 2013, 1–8. URL: https://doi.org/10.1155/2013/346285

García-Laencina, P. J., Sancho-Gómez, J.-L. and Figueiras-Vidal, A. R. (2010), ‘Pattern classification with missing data: a review’, Neural Computing and Applications 19(2), 263–282. URL: https://doi.org/10.1007/s00521-009-0295-6

Genuer, R., Poggi, J.-M. and Tuleau-Malot, C. (2010), ‘Variable selection using random forests’, Pattern Recognition Letters 31(14), 2225–2236. URL: https://doi.org/10.1016/j.patrec.2010.03.014

Gorunescu, F. (2011), Data Mining, Intelligent Systems Reference Library, Springer Berlin Heidelberg, Berlin, Heidelberg. URL: https://doi.org/10.1007/978-3-642-19721-5

Gregorutti, B., Michel, B. and Saint-Pierre, P. (2017), ‘Correlation and variable importance in random forests’, Statistics and Computing 27(3), 659–678. URL: https://doi.org/10.1007/s11222-016-9646-1

Hanna, A. M., Ural, D. and Saygili, G. (2007), ‘Neural network model for liquefaction potential in soil deposits using turkey and taiwan earthquake data’, Soil Dynamics and Earthquake Engineering 27(6), 521–540. URL: https://doi.org/10.1016/j.soildyn.2006.11.001

Hu, J. (2021), ‘Data cleaning and feature selection for gravelly soil liquefaction’, Soil Dynamics and Earthquake Engineering 145, 106711. URL: https://doi.org/10.1016/j.soildyn.2021.106711

Hu, J.-L., Tang, X.-W. and Qiu, J.-N. (2015), ‘A bayesian network approach for predicting seismic liquefaction based on interpretive structural modeling’, Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards 9(3), 200–217. URL: https://doi.org/10.1080/17499518.2015.1076570

Hu, J., Tan, Y. and Zou, W. (2021), ‘Key factors influencing earthquake-induced liquefaction and their direct and mediation effects’, PLOS ONE 16(2), e0246387. URL: https://doi.org/10.1371/journal.pone.0246387

Hu, J. and Wang, J. (2024), ‘A data extension framework of seismic-induced gravelly soil liquefaction based on semi-supervised methods’, Advanced Engineering Informatics 59, 102295. URL: https://doi.org/10.1016/j.aei.2023.102295

Hwang, J.-H. and Yang, C.-W. (2001), ‘Verification of critical cyclic strength curve by taiwan chi-chi earthquake data’, Soil Dynamics and Earthquake Engineering 21(3), 237–257. URL: https://doi.org/10.1016/S0267-7261(01)00002-1

Idriss, I. M. and Boulanger, R. W. (2008), Soil Liquefaction During Earthquakes, Earthquake Engineering Research Institute (EERI).

Khatti, J., Fissha, Y., Grover, K. S., Ikeda, H., Toriya, H., Adachi, T. and Kawamura, Y. (2024), ‘Cone penetration test-based assessment of liquefaction potential using machine and hybrid learning approaches’, Multiscale and Multidisciplinary Modeling, Experiments and Design 7(4), 3841–3864. URL: https://doi.org/10.1007/s41939-024-00447-x

Khatti, J. and Grover, K. S. (2024a), ‘Assessment of uniaxial strength of rocks: A critical comparison between evolutionary and swarm optimized relevance vector machine models’, Transportation Infrastructure Geotechnology . URL: https://doi.org/10.1007/s40515-024-00433-3

Khatti, J. and Grover, K. S. (2024b), ‘Prediction of uniaxial strength of rocks using relevance vector machine improved with dual kernels and metaheuristic algorithms’, Rock Mechanics and Rock Engineering 57(8), 6227–6258. URL: https://doi.org/10.1007/s00603-024-03849-y

Kumar, D. R., Samui, P. and Burman, A. (2022), ‘Prediction of probability of liquefaction using soft computing techniques’, Journal of The Institution of Engineers (India): Series A 103(4), 1195–1208. URL: https://doi.org/10.1007/s40030-022-00683-9

Kumar, D. R., Samui, P. and Burman, A. (2023), ‘Suitability assessment of the best liquefaction analysis procedure based on spt data’, Multiscale and Multidisciplinary Modeling, Experiments and Design 6(2), 319–329. URL: https://doi.org/10.1007/s41939-023-00148-x

Kumar, D. R., Samui, P., Burman, A., Biswas, R. and Vanapalli, S. (2024), ‘A novel approach for assessment of seismic induced liquefaction susceptibility of soil’, Journal of Earth System Science 133(3), 128. URL: https://doi.org/10.1007/s12040-024-02341-z

Kumar, D. R., Samui, P., Burman, A. and Kumar, S. (2024), ‘Seismically induced liquefaction potential assessment by different artificial intelligence procedures’, Transportation Infrastructure Geotechnology 11(3), 1272–1293. URL: https://doi.org/10.1007/s40515-023-00327-w

Kumar, D. R., Samui, P., Burman, A., Wipulanusat, W. and Keawsawasvong, S. (2023), ‘Liquefaction susceptibility using machine learning based on spt data’, Intelligent Systems with Applications 20, 200281. URL: https://doi.org/10.1016/j.iswa.2023.200281

Lin, W.-C. and Tsai, C.-F. (2020), ‘Missing value imputation: a review and analysis of the literature (2006–2017)’, Artificial Intelligence Review 53(2), 1487–1509. URL: https://doi.org/10.1007/s10462-019-09709-4

Mandhare, H. C. and Idate, S. R. (2017), A comparative study of cluster based outlier detection, distance based outlier detection and density based outlier detection techniques, in ‘2017 International Conference on Intelligent Computing and Control Systems (ICICCS)’, IEEE, Madurai, pp. 931–935. URL: https://doi.org/10.1109/ICCONS.2017.8250601

Manzali, Y., Barry, K., Flouchi, R., Balouki, Y. and Elfar, M. (2024), ‘A feature weighted k-nearest neighbor algorithm based on association rules’, Journal of Ambient Intelligence and Humanized Computing 15, 1–14. URL: https://doi.org/10.1007/s12652-024-04793-z

Nguyen, Q. H., Ly, H.-B., Ho, L. S., Al-Ansari, N., Le, H. V., Tran, V. Q., Prakash, I. and Pham, B. T. (2021), ‘Influence of data splitting on performance of machine learning models in prediction of shear strength of soil’, Mathematical Problems in Engineering 2021(1), 4832864. URL: https://doi.org/10.1155/2021/4832864

Paleczek, A., Grochala, D. and Rydosz, A. (2021), ‘Artificial breath classification using xgboost algorithm for diabetes detection’, Sensors 21(12), 4187. URL: https://doi.org/10.3390/s21124187

Pan, R., Yang, T., Cao, J., Lu, K. and Zhang, Z. (2015), ‘Missing data imputation by k nearest neighbours based on grey relational structure and mutual information’, Applied Intelligence 43(3), 614–632. URL: https://doi.org/10.1007/s10489-015-0666-x

Pham, B. T., Qi, C., Ho, L. S., Nguyen-Thoi, T., Al-Ansari, N., Nguyen, M. D., Nguyen, H. D., Ly, H.-B., Le, H. V. and Prakash, I. (2020), ‘A novel hybrid soft computing model using random forest and particle swarm optimization for estimation of undrained shear strength of soil’, Sustainability 12(6), 2218. URL: https://doi.org/10.3390/su12062218

Probst, P., Wright, M. N. and Boulesteix, A. (2019), ‘Hyperparameters and tuning strategies for random forest’, WIREs Data Mining and Knowledge Discovery 9(3), e1301. URL: https://doi.org/10.1002/widm.1301

Puri, N., Prasad, H. D. and Jain, A. (2018), ‘Prediction of geotechnical parameters using machine learning techniques’, Procedia Computer Science 125, 509–517. URL: https://doi.org/10.1016/j.procs.2017.12.066

Ranjan, G. S. K., Kumar Verma, A. and Radhika, S. (2019), K-nearest neighbors and grid search cv based real time fault monitoring system for industries, in ‘2019 IEEE 5th International Conference for Convergence in Technology (I2CT)’, IEEE, Bombay, India, pp. 1–5. URL: https://doi.org/10.1109/I2CT45611.2019.9033691

Roy, M.-H. and Larocque, D. (2012), ‘Robustness of random forests for regression’, Journal of Nonparametric Statistics 24(4), 993–1006. URL: https://doi.org/10.1080/10485252.2012.715161

Sahin, E. K. and Demir, S. (2023), ‘Greedy-automl: A novel greedy-based stacking ensemble learning framework for assessing soil liquefaction potential’, Engineering Applications of Artificial Intelligence 119, 105732. URL: https://doi.org/10.1016/j.engappai.2022.105732

Samadi, H., Hassanpour, J., Rostami, J. and Khatti, J. (2024), Application of supervised learning algorithms to predict engineering characteristics of soft to strong rock masses using actual tbm performance data, in ‘58th U.S. Rock Mechanics/Geomechanics Symposium’, ARMA, Golden, Colorado, USA, p. D022S023R001. URL: https://doi.org/10.56952/ARMA-2024-0036

Seed, H. B. and Idriss, I. M. (1971), ‘Simplified procedure for evaluating soil liquefaction potential’, Journal of the Soil Mechanics and Foundations Division 97(9), 1249–1273. URL: https://doi.org/10.1061/JSFEAQ.0001662

Shi, X., Wong, Y. D., Chai, C. and Li, M. Z.-F. (2021), ‘An automated machine learning (automl) method of risk prediction for decision-making of autonomous vehicles’, IEEE Transactions on Intelligent Transportation Systems 22(11), 7145–7154. URL: https://doi.org/10.1109/TITS.2020.3002419

Tang, L. and Na, S. (2021), ‘Comparison of machine learning methods for ground settlement prediction with different tunneling datasets’, Journal of Rock Mechanics and Geotechnical Engineering 13(6), 1274–1289. URL: https://doi.org/10.1016/j.jrmge.2021.08.006

Theng, D. and Bhoyar, K. K. (2024), ‘Feature selection techniques for machine learning: a survey of more than two decades of research’, Knowledge and Information Systems 66(3), 1575–1637. URL: https://doi.org/10.1007/s10115-023-02010-5

Torres, E. S. and Dungcaa, J. R. (2024), ‘An interpretable machine learning approach in understanding lateral spreading case histories’, International Journal of GEOMATE 26(116). URL: https://doi.org/10.21660/2024.116.g13159

Wang, Y. and Sherry Ni, X. (2019), ‘A xgboost risk model via feature selection and bayesian hyperparameter optimization’, International Journal of Database Management Systems 11(01), 01–17. URL: https://doi.org/10.5121/ijdms.2019.11101

Xie, Y., Ebad Sichani, M., Padgett, J. E. and DesRoches, R. (2020), ‘The promise of implementing machine learning in earthquake engineering: A state-of-the-art review’, Earthquake Spectra 36(4), 1769–1801. URL: https://doi.org/10.1177/8755293020919419

Xue, X., Yang, X. and Li, P. (2017), ‘Application of a probabilistic neural network for liquefaction assessment’, Neural Network World 27(6), 557–567. URL: https://doi.org/10.14311/NNW.2017.27.030

Ye, Y., Wu, Q., Huang, J. Z., Ng, M. K. and Li, X. (2013), ‘Stratified sampling for feature subspace selection in random forests for high dimensional data’, Pattern Recognition 46(3), 769–787. URL: https://doi.org/10.1016/j.patcog.2012.09.022

Youd, T. L., Idriss, I. M., Andrus, R. D., Arango, I., Castro, G., Christian, J. T., Dobry, R., Finn, W. D. L., Harder, L. F., J., Hynes, M. E., Ishihara, K., Koester, J. P., Liao, S. S. C., Marcuson, W. F., I., Martin, G. R., Mitchell, J. K., Moriwaki, Y., Power, M. S., Robertson, P. K., Seed, R. B. and Stokoe, K. H., I. (2001), ‘Liquefaction resistance of soils: Summary report from the 1996 nceer and 1998 nceer/nsf workshops on evaluation of liquefaction resistance of soils’, Journal of Geotechnical and Geoenvironmental Engineering 127(4), 297–313. URL: https://doi.org/10.1061/(ASCE)1090-0241(2001)127:4(297)

Zakariya, A., Rifa’i, A. and Ismanti, S. (2023), ‘The correlation of liquefaction potential and probability on excess pore water pressure in kretek 2 bridge area’, Journal of the Civil Engineering Forum pp. 39–48. URL: https://doi.org/10.22146/jcef.7002

Zhang, J. and Wang, Y. (2021), ‘An ensemble method to improve prediction of earthquake induced soil liquefaction: a multi-dataset study’, Neural Computing and Applications 33(5), 1533–1546. URL: https://doi.org/10.1007/s00521-020-05086-6

Zhang, P., Jia, Y. and Shang, Y. (2022), ‘Research and application of xgboost in imbalanced data’, International Journal of Distributed Sensor Networks 18(6), 155013292211069. URL: https://doi.org/10.1155/2022/1550132

Zhao, Z., Duan, W. and Cai, G. (2021), ‘A novel pso-kelm based soil liquefaction potential evaluation system using cpt and vs measurements’, Soil Dynamics and Earthquake Engineering 150, 106930. URL: https://doi.org/10.1016/j.soildyn.2021.106930

Zhao, Z., Duan, W., Cai, G., Wu, M. and Liu, S. (2022), ‘Cpt-based fully probabilistic seismic liquefaction potential assessment to reduce uncertainty: Integrating xgboost algorithm with bayesian theorem’, Computers and Geotechnics 149, 104868. URL: https://doi.org/10.1016/j.compgeo.2022.104868

Zhao, Z., Duan, W., Cai, G., Wu, M., Liu, S. and Puppala, A. J. (2024), ‘Probabilistic capacity energy-based machine learning models for soil liquefaction reliability analysis’, Engineering Geology 338, 107613. URL: https://doi.org/10.1016/j.enggeo.2024.107613

Published
2025-11-13
How to Cite
Fadliansyah, F., Faris, F., Wilopo, W., & Ardiansyah. (2025). Enhancing Soil Liquefaction Prediction: Overcoming Data Challenges in SPT-Based Machine Learning with Imputation Technique. Journal of the Civil Engineering Forum, 12(1), 23-39. https://doi.org/10.22146/jcef.21347
Section
Articles