Detecting YouTube Clickbait with Transformer Models: A Comparative Study

https://doi.org/10.22146/ijccs.111977

Bryan Samuel(1), Theresia Ratih Dewi Saputri(2*)

(1) Universitas Ciputra
(2) Universitas Ciputra
(*) Corresponding Author

Abstract


Clickbait remains a common strategy on YouTube, where video titles are often crafted to maximize viewer engagement. Although transformer-based machine learning technologies have advanced rapidly, studies that specifically investigate clickbait in YouTube video titles are still rare, even though such titles have unique linguistic characteristics that are shorter, more informal, and more ambiguous than news headlines or other social media texts. This study compares three Transformer models, namely BERT, RoBERTa, and XLNet, for the task of clickbait detection using two benchmark datasets. Each model was fine-tuned and evaluated using standard classification metrics, with additional analyses on training and inference efficiency. The results show that all three models achieved accuracy above 95 percent. RoBERTa achieved the best performance on the Chaudhary dataset (99.84 percent), while BERT cased performed best on the Vierti dataset (96.91 percent). In contrast, XLNet lagged in both accuracy and computational efficiency, with inference times exceeding six seconds per batch. This study demonstrates a 1.31 percent improvement in accuracy compared to previous SVM-based methods and provides a comprehensive evaluation of three Transformer architectures in the YouTube context, offering empirical guidance for more effective clickbait detection.

Keywords


Clickbait; Youtube; Transformer; RoBERTa; Text Classification

Full Text:

PDF


References

D. Varshney and D. K. Vishwakarma, “A unified approach for detection of Clickbait videos on YouTube using cognitive evidences,” Appl. Intell. Dordr. Neth., vol. 51, no. 7, pp. 4214–4235, 2021, doi: 10.1007/s10489-020-02057-9.

D. Fayvishenko and I. Shudrak, “Clickbait and Its Impact on Media Trust: Analytical Review,” State Reg. Ser. Soc. Commun., no. 1(61), pp. 26–32, June 2025, doi: 10.32840/cpu2219-8741/2025.1(61).4.

H. Lu, L. Ehwerhemuepha, and C. Rakovski, “A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance,” BMC Med. Res. Methodol., vol. 22, p. 181, July 2022, doi: 10.1186/s12874-022-01665-y.

A. Chowanda, N. Nadia, and L. M. M. Kolbe, “Identifying clickbait in online news using deep learning,” Bull. Electr. Eng. Inform., vol. 12, no. 3, Art. no. 3, June 2023, doi: 10.11591/eei.v12i3.4444.

P. Rajapaksha, R. Farahbakhsh, and N. Crespi, “BERT, XLNet or RoBERTa: The Best Transfer Learning Model to Detect Clickbaits,” IEEE Access, vol. 9, pp. 154704–154716, 2021, doi: 10.1109/ACCESS.2021.3128742.

J. Sirusstara, N. Alexander, A. Alfarisy, S. Achmad, and R. Sutoyo, “Clickbait Headline Detection in Indonesian News Sites using Robustly Optimized BERT Pre-training Approach (RoBERTa),” in 2022 3rd International Conference on Artificial Intelligence and Data Sciences (AiDAS), Sept. 2022, pp. 1–6. doi: 10.1109/AiDAS56890.2022.9918678.

R. Kemm, “The Linguistic and Typological Features of Clickbait in Youtube Video Titles,” Soc. Commun., vol. 23, no. 1, Art. no. 1, Jan. 2022, doi: 10.2478/sc-2022-0007.

T. S. Y. Winarto, K. Wijaya, M. A. Faqih, S. Y. Prasetyo, and Y. Muliono, “Tackling Clickbait with Machine Learning: A Comparative Study of Binary Classification Models for YouTube Title,” Procedia Comput. Sci., vol. 227, pp. 282–290, Jan. 2023, doi: 10.1016/j.procs.2023.10.526.

A. Chaudhary, “Dataset of clickbait and non-clickbait titles.” Accessed: May 21, 2025. [Online]. Available: https://gist.github.com/amitness/0a2ddbcb61c34eab04bad5a17fd8c86b

A. Vierti, alessiovierti/youtube-clickbait-detector. (Aug. 20, 2023). Jupyter Notebook. Accessed: May 21, 2025. [Online]. Available: https://github.com/alessiovierti/youtube-clickbait-detector

H. Alawneh, A. Hasasneh, and M. Maree, “On the Utilization of Emoji Encoding and Data Preprocessing with a Combined CNN-LSTM Framework for Arabic Sentiment Analysis,” Modelling, vol. 5, no. 4, pp. 1469–1489, Dec. 2024, doi: 10.3390/modelling5040076.

S. Kurniawan, A. S. Pramayoga, and Y. F. Ashari, “An Ensemble-Based Approach for Detecting Clickbait in Indonesian Online Media,” J. Masy. Inform., vol. 16, no. 1, pp. 104–118, May 2025, doi: 10.14710/jmasif.16.1.73115.

S. Islam et al., “A comprehensive survey on applications of transformers for deep learning tasks,” Expert Syst. Appl., vol. 241, p. 122666, May 2024, doi: 10.1016/j.eswa.2023.122666.

F. S. Amalia and Y. Suyanto, “Offensive Language and Hate Speech Detection using BERT Model,” IJCCS Indones. J. Comput. Cybern. Syst., vol. 18, no. 4, Art. no. 4, Oct. 2024, doi: 10.22146/ijccs.99841.

I. J. David, M. U. Adehi, and P. O. Ikwuoche, “Cochran’s Q-Test on Soil Helminth Prevalence,” Biom. Lett., vol. 58, no. 2, pp. 169–185, Dec. 2021, doi: 10.2478/bile-2021-0013.



DOI: https://doi.org/10.22146/ijccs.111977

Article Metrics

Abstract views : 94 | views : 27

Refbacks

  • There are currently no refbacks.




Copyright (c) 2025 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.



Copyright of :
IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
ISSN 1978-1520 (print); ISSN 2460-7258 (online)
is a scientific journal the results of Computing
and Cybernetics Systems
A publication of IndoCEISS.
Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281
Fax: +62274 555133
email:ijccs.mipa@ugm.ac.id | http://jurnal.ugm.ac.id/ijccs



View My Stats1
View My Stats2