Text Classification of Public Complaint Validity With Deep Learning Approaches

Ignatius Wisnu Prayogo; Yulia Kurniawati; Isnina Eva Hidayati; Amelia Khairunnisa; Yova Ruldeviyani

doi:10.22146/jnteti.v15i2.24003

Ignatius Wisnu Prayogo Information Technology Study Program, Faculty of Computer Science, Universitas Indonesia, Depok, Jawa Barat 16424, Indonesia
Yulia Kurniawati Information Technology Study Program, Faculty of Computer Science, Universitas Indonesia, Depok, Jawa Barat 16424, Indonesia
Isnina Eva Hidayati Information Technology Study Program, Faculty of Computer Science, Universitas Indonesia, Depok, Jawa Barat 16424, Indonesia
Amelia Khairunnisa Information Technology Study Program, Faculty of Computer Science, Universitas Indonesia, Depok, Jawa Barat 16424, Indonesia
Yova Ruldeviyani Information Technology Study Program, Faculty of Computer Science, Universitas Indonesia, Depok, Jawa Barat 16424, Indonesia

DOI: https://doi.org/10.22146/jnteti.v15i2.24003

Keywords: Text Classification, Deep Learning, Fine-Tuning, IndoBERT, JakartaKini

Abstract

Mobile apps Jakarta Kini (JAKI) recorded 173,327 public complaints in 2023, accounted for 91.37% of all public complaints to the DKI Jakarta Provincial Government. Reports submitted to the Cepat Respon Masyarakat system must be handled in accordance with the service level agreement (SLA) regarding handling time. Currently, the process of validating incoming reports is still done manually by officers, taking more than 30 minutes per report. In the same year, 15,634 complaints were recorded as unclear or invalid. This led to a decrease in the performance of local government agency, impacting 13% of them did not achieve a 100% SLA in 2023. This study aimed to automate the validity classification of public complaints to distinguish between valid and invalid reports. The study utilized a dataset of 2,000 reports and employed deep learning models, including Indonesian version of bidirectional encoder representations from transformers (IndoBERT) and multilingual BERT (mBERT), and to compare their performance against traditional machine learning baselines, including term frequency-inverse document frequency (TF-IDF) + extreme gradient boosting (XGBoost), naïve Bayes, and support vector machine (SVM) using a 5-fold cross-validation scheme. The results showed that the IndoBERT model could classify valid or invalid reports with an average accuracy of 88.8%, which was higher than other models. The implementation of this method has proven to increase the efficiency of report validation time with computation time of 6 minutes for 300 reports, thus helping government agencies achieve their SLA targets and contributing to research on the effectiveness of BERT in public complaint classification.

References

“Statistik Daerah Provinsi DKI Jakarta Tahun 2024,” Badan Pusat Statistik Provinsi DKI Jakarta, 2024.

Bappeda Provinsi DKI Jakarta, “Rencana pembangunan daerah 2023-2026 Provinsi DKI Jakarta.” Bappeda. Access date: 19-Aug-2025. [Online]. Available: https://bappeda.jakarta.go.id/news/dokumen-rencana-pembangunan-daerah-tahun-2023-2026

“Annual Report Jakarta Smart City 2023,” Jakarta Smart City, 2024.

Jakarta Smart City, “Laporan Analisa CRM 2023,” unpublished.

F. Koto, A. Rahimi, J.H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A benchmark dataset and pre-trained language model for Indonesian NLP,” in Proc. 28th Int. Conf. Comput. Linguist., 2020, pp. 757–770, doi: 10.18653/v1/2020.coling-main.66.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. 2019 Conf. N. Am. Chapter Assoc. Comput. Linguist., 2019, pp. 4171–4186, doi: 10.18653/v1/N19-1423.

F. Lan, “Research on text similarity measurement hybrid algorithm with term semantic information and TF-IDF method,” Adv. Multimed., vol. 2022, no. 1, pp. 1–11, Apr. 2022, doi: 10.1155/2022/7923262.

Z. Xu, “Research on deep learning in natural language processing,” Adv. Comput. Commun., vol. 4, no. 3, pp. 196–200, Jul. 2023, doi: 10.26855/acc.2023.06.018.

S.M. Intani et al., “Automating public complaint classification through JakLapor channel: A case study of Jakarta, Indonesia,” in 2022 IEEE Int. Smart Cities Conf. (ISC2), 2022, pp. 1–6, doi: 10.1109/ISC255366.2022.9922346.

E.D. Madyatmadja, B.N. Yahya, and C. Wijaya, “Contextual text analytics framework for citizen report classification: A case study using the Indonesian language,” IEEE Access, vol. 10, pp. 31432–31444, Mar. 2022, doi: 10.1109/ACCESS.2022.3158940.

Qurat-Ul-Ain, A. Shaukat, and U. Saif, “NLP based model for classification of complaints: Autonomous and intelligent system,” in 2022 2nd Int. Conf. Digit. Futures Transform. Technol. (ICoDT2), 2022, pp. 1–6, doi: 10.1109/ICoDT255437.2022.9787456.

F. Caldeira, L. Nunes, and R. Ribeiro, “Classification of public administration complaints,” in Symp. Lang. Appl. Technol. (SLATE), 2022, pp. 1–12, doi: 10.4230/OASIcs.SLATE.2022.9.

A. Rahmawati, A. Alamsyah, and A. Romadhony, “Hoax news detection analysis using IndoBERT deep learning methodology,” in 2022 10th Int. Conf. Inf. Commun. Technol. (ICoICT), 2022, pp. 368–373, doi: 10.1109/ICoICT55009.2022.9914902.

F. Muftie and M. Haris, “IndoBERT based data augmentation for Indonesian text classification,” in 2023 Int. Conf. Inf. Technol. Res. Innov. (ICITRI), 2023, pp. 128–132, doi: 10.1109/ICITRI59340.2023.10250061.

G.Z. Nabiilah, I.N. Alam, E.S. Purwanto, and M.F. Hidayat, “Indonesian multilabel classification using IndoBERT embedding and MBERT classification,” Int. J. Elect. Comput. Eng., vol. 14, no. 1, pp. 1071–1078, Feb. 2024, doi: 10.11591/ijece.v14i1.pp1071-1078.

C.P. Chai, “Comparison of text preprocessing methods,” Nat. Lang. Eng., vol. 29, no. 3, pp. 509–553, May 2023, doi: 10.1017/S1351324922000213.

J. Howard and S. Ruder, “Universal language model fine-tuning for text classification,” Proc. 56th Annu. Meet. Assoc. Comput. Linguist., 2018, pp. 328–339, doi: 10.18653/V1/P18-1031.

H. Niu et al., “EHR-BERT: A BERT-based model for effective anomaly detection in electronic health records,” J. Biomed. Inform., vol. 150, pp. 1–11, Feb. 2024, doi: 10.1016/j.jbi.2024.104605.

L. Lin, “Multilingual text classification based on deep learning models,” in 2023 IEEE 11th Jt. Int. Inf. Technol. Artif. Intell. Conf. (ITAIC), 2023, pp. 1202–1205, doi: 10.1109/ITAIC58329.2023.10409100.

Z. Qi, “The text classification of theft crime based on TF-IDF and XGBoost model,” in 2020 IEEE Int. Conf. Artif. Intell. Comput. Appl. (ICAICA), 2020, pp. 1241–1246, doi: 10.1109/ICAICA50127.2020.9182555.

G. Ozogur, M.A. Erturk, Z.G. Aydin, and M.A. Aydin, “Android malware detection in bytecode level using TF-IDF and XGBoost,” Comput. J., vol. 66, no. 9, pp. 2317–2328, Sep. 2023, doi: 10.1093/comjnl/bxac198.

K.H. Liland, J. Skogholt, and U.G. Indahl, “A new formula for faster computation of the k-fold cross-validation and good regularisation parameter values in ridge regression,” IEEE Access, vol. 12, pp. 17349–17368, Jan. 2024, doi: 10.1109/ACCESS.2024.3357097.

E. Sevinç, “An empowered AdaBoost algorithm implementation: A COVID-19 dataset study,” Comput. Ind. Eng., vol. 165, pp. 1–13, Mar. 2022, doi: 10.1016/j.cie.2021.107912.

N.A. Semary et al., “Enhancing machine learning-based sentiment analysis through feature extraction techniques,” PLoS ONE, vol. 19, no. 2, pp. 1–19, Feb. 2024, doi: 10.1371/journal.pone.0294968.

J. Görtler et al., “Neo: Generalizing confusion matrix visualization to hierarchical and multi-output labels,” in CHI '22, Proc. 2022 CHI Conf. Hum. Factors Comput. Syst., 2022, pp. 1–13, doi: 10.1145/3491102.3501823.

S. Shila, F.T. Bayshakhy, and A. Sattar, “Detection and classification of road damage using deep learning approach with smartphone images,” in 2023 14th Int. Conf. Comput. Commun. Netw. Technol. (ICCCNT), 2023, pp. 1–7, doi: 10.1109/ICCCNT56998.2023.10307090.

F. Kamalov, A.F. Atiya, and D. Elreedy, “Partial resampling of imbalanced data,” 2022, arXiv:2207.04631.

J.M. Johnson and T.M. Khoshgoftaar, “Survey on deep learning with class imbalance,” J. Big Data, vol. 6, pp. 1–54, Mar. 2019, doi: 10.1186/s40537-019-0192-5.

M.A. Mutasodirin, R.E. Prasojo, A.F. Abka, and H. Rasyidi, “Simple hack for transformers against heavy long-text classification on a time- and memory-limited GPU service,” in 2023 10th Int. Conf. Adv. Inform.: Concept Theory Appl. (ICAICTA), 2023, pp. 1–6, doi: 10.1109/ICAICTA59291.2023.10390269.