Research and Analysis of IndoBERT Hyperparameter Tuning in Fake News Detection
Abstract
The rapid advancement of communication technology has transformed how information is shared, but it has also brought concerns about the proliferation of false information. A recent report by the Ministry of Communication and Informatics in Indonesia revealed that around 800,000 websites were involved in spreading false information, underscoring the seriousness of the problem. To combat this issue, researchers have focused on developing techniques to detect and combat fake news. This research centers on using IndoBERT-base-p1 for fake news detection and aims to enhance its performance through three methods to tune the hyperparameter value of the model namely: Bayesian optimization, grid search, and random search. After comparing the outcomes of the three hyperparameter tuning methods, Bayesian Optimization emerged as the most effective approach. Achieving a precision of 88.79%, recall of 94.5%, and F1-score of 91.56% for the “fake” label, Bayesian Optimization outperformed the other hyperparameter tuning methods as well as the model using the fine-tuning hyperparameter value. These findings emphasize the importance of hyperparameter tuning in improving the accuracy of fake news detection models. Utilizing Bayesian Optimization and optimizing the specified hyperparameters, the model demonstrated superior performance in accurately identifying instances of fake news, providing a valuable tool in the ongoing battle against disinformation in the digital realm.
References
V.B. Kusnandar (2021) “Pengguna internet Indonesia peringkat ke-3 terbanyak di Asia,” [Online], https://databoks.katadata.co.id/datapublish/2021/10/14/pengguna-internet-indonesia-peringkat-ke-3-terbanyak-di-asia, access date: 10-Jan-2023.
M.A. Rahmat and I.S. Areni, “Hoax web detection for news in Bahasa using support vector machine,” 2019 Int. Conf. Inf. Commun. Technol. (ICOIACT), 2019, pp. 332–336, doi: 10.1109/ICOIACT46704.2019.8938425.
A. Thota, P. Tilak, S. Ahluwalia, and N. Lohia, “Fake news detection: A deep learning approach,” SMU Data Sci. Rev., vol. 1, no. 3, pp. 1–20, 2018.
R. Lumbantoruan et al., “Analysis comparison of FastText and Word2vec for detecting offensive language,” 2022 IEEE Int. Conf. Comput. Sci. Inf. Technol. (ICOSNIKOM), 2022, pp. 1–8, doi: 10.1109/ICOSNIKOM56551.2022.10034886.
I. Nadzir, S. Seftiani, and Y.S. Permana, “Hoax and misinformation in Indonesia: Insights from a nationwide survey,” ISEAS-Yusof Ishak Inst., vol. 2019, pp. 1–12, Nov. 2019.
A. Yuliani (2017) “Ada 800.000 situs penyebar hoax di Indonesia,” [Online], https://www.kominfo.go.id/content/detail/12008/ada-800000-situs-penyebar-hoax-di-indonesia/0/sorotan_media, access date: 10-Jan-2023.
R. Sultana and T. Nishino, “Fake news detection system: An implementation of BERT and boosting algorithm,” Proc. 38th Int. Conf. Comput. Their Appl., 2023, pp. 124–137, doi: 10.29007/d931.
L.H. Suadaa, I. Santoso, and A.T.B. Panjaitan, “Transfer learning of pre-trained transformers for COVID-19 hoax detection in Indonesian language,” Indones. J. Comput. Cybern. Syst. (IJCCS), vol. 15, no. 3, pp. 317–326, Jul. 2021, doi: 10.22146/ijccs.66205.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” Proc. 2019 Conf. N. Am. Chapter Assoc. Comput. Linguist., Hum. Lang. Technol., 2019, pp. 4171–4186, doi: 10.18653/v1/N19-1423.
J. Fawaid, A. Awalina, R.Y. Krisnabayu, and N. Yudistira, “Indonesia’s fake news detection using transformer network,” Proc. 6th Int. Conf. Sustain. Inf. Eng. Technol., 2021, pp. 247–251, doi: 10.1145/3479645.3479666.
M. Guderlei and M. Aßenmacher, “Evaluating unsupervised representation learning for detecting stances of fake news,” Proc. 28th Int. Conf. Comput. Linguist., 2020, pp. 6339–6349, doi: 10.18653/v1/2020.coling-main.558.
R.R. Rajalaxmi et al., “Optimizing hyperparameters and performance analysis of LSTM model in detecting fake news on social media,” ACM Trans. Asian Low-Resour. Lang. Inf. Process., to be published, doi: 10.1145/3511897.
N. Kanagavalli, S.B. Priya, and D. Jeyakumar, “Design of hyperparameter tuned deep learning based automated fake news detection in social networking data,” 2022 6th Int. Conf. Comput. Methodol. Commun. (ICCMC), 2022, pp. 958–963, doi: 10.1109/ICCMC53470.2022.9753739.
C.W. Kencana, E.B. Setiawan, and I. Kurniawan, “Hoax detection system on Twitter using feed-forward and back-propagation neural networks classification method,” J. RESTI (Rekayasa Sist. Teknol. Inf.), vol. 4, no. 4, pp. 655–663, Aug. 2020, doi: 10.29207/resti.v4i4.2038.
M.E. Peters et al., “Deep contextualized word representations,” Proc. 2018 Conf. N. Am. Chapter Assoc. Comput. Linguist., Hum. Lang. Technol., 2018, pp. 2227–2237, doi: 10.18653/v1/N18-1202.
R.K. Kaliyar, A. Goswami, and P. Narang, “FakeBERT: Fake news detection in social media with a BERT-based deep learning approach,” Multimed. Tools Appl., vol. 80, no. 8, pp. 11765–11788, Mar. 2021, doi: 10.1007/s11042-020-10183-2.
F. Koto, A. Rahimi, J.H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A benchmark dataset and pre-trained language model for Indonesian NLP,” Proc. 28th Int. Conf. Comput. Linguist., 2020, pp. 757–770, doi: 10.18653/v1/2020.coling-main.66.
S.M. Isa, G. Nico, and M. Permana, “IndoBERT for Indonesian fake news detection,” ICIC Express Lett., vol. 16, no. 3, pp. 289–297, Mar. 2022, doi: 10.24507/icicel.16.03.289.
B. Bischl et al., “Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges,” WIREs Data Min. Knowl. Discov., vol. 13, no. 2, pp. 1–43, Mar./Apr. 2023, doi: 10.1002/widm.1484.
T. Akiba et al., “Optuna: A next-generation hyperparameter optimization framework,” Proc. 25th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2019, pp. 26232631, doi: 10.1145/3292500.3330701.
S.A. Alasadi and W.S. Bhaya, “Review of data preprocessing techniques in data mining,” J. Eng. Appl. Sci., vol. 12, no. 16, pp. 4102–4107, Sep. 2017, doi: 10.3923/jeasci.2017.4102.4107.
© Jurnal Nasional Teknik Elektro dan Teknologi Informasi, under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License.