Hate Speech Detection in Indonesian Twitter using Contextual Embedding Approach

Guntur Budi Herwanto; Annisa Maulida Ningtyas; I Gede Mujiyatna; Kurniawan Eka Nugraha; I Nyoman Prayana Trisna

doi:10.22146/ijccs.64916

Hate Speech Detection in Indonesian Twitter using Contextual Embedding Approach

https://doi.org/10.22146/ijccs.64916

Guntur Budi Herwanto^(1*), Annisa Maulida Ningtyas⁽²⁾, I Gede Mujiyatna⁽³⁾, Kurniawan Eka Nugraha⁽⁴⁾, I Nyoman Prayana Trisna⁽⁵⁾

(1) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(2) Department of Health Information and Services, Universitas Gadjah Mada Yogyakarta, Indonesia
(3) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(4) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(5) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(*) Corresponding Author

Abstract

Hate speech develops along with the rapid development of social media. Hate speech is often issued due to a lack of public awareness of the difference between criticism and statements that might contribute to this crime. Therefore, it is very important to do early detection of sentences that will be written before causing a criminal act due to public ignorance. In this paper, we use the advancement of deep neural networks to predict whether a sentence contains a hate speech and abusive tone. We demonstrate the robustness of different word and contextual embedding to represent the semantic of hate speech words. In addition, we use a document embedding representation via a recurrent neural networks with gated recurrent unit as the main architecture to provide richer representation. Compared to syntactic representation of the previous approach, the contextual embedding in our model proved to give a significant boost on the performance by a significant margin.

Keywords

hate speech; natural language processing; deep neural network; contextual embedding; recurrent neural network

Full Text:

PDF

References

[1] W. Warner and J. Hirschberg, "Detecting hate speech on the world wide web," in Proceedings of the second workshop on language in social media, 2012, pp. 19–26, [Online]. Available: https://www.aclweb.org/anthology/W12-2103/.

[2] L. S. Widayati, “Ujaran Kebencian: Batasan Pengertian dan Larangannya,” Info Singk. Kaji. Singk. terhadap isu Aktual dan Strateg., 2018, [Online]. Available: http://berkas.dpr.go.id/puslit/files/info_singkat/Info Singkat-X-6-II-P3DI-Maret-2018-186.pdf.

[3] J. Garland, K. Ghazi-Zahedi, J.-G. Young, L. Hébert-Dufresne, and M. Galesic, "Countering hate on social media: Large scale classification of hate and counter speech." 2020, [Online]. Available: https://arxiv.org/abs/2006.01974.

[4] E. Spertus, "Smokey : Automatic cogniti ostile Messages," 1997, [Online]. Available: https://www.aaai.org/Papers/IAAI/1997/IAAI97-209.pdf.

[5] F. Del Vigna, A. Cimino, and F. D. Orletta, "Hate me , hate me not : Hate speech detection on Facebook Hate me , hate me not : Hate speech detection on Facebook," no. May, 2017, [Online]. Available: http://ceur-ws.org/Vol-1816/paper-09.pdf.

[6] Z. Waseem and D. Hovy, "Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter," in Proceedings of the NAACL Student Research Workshop, Jun. 2016, pp. 88–93, doi: 10.18653/v1/N16-2013.

[7] N. Djuric, J. Zhou, R. Morris, M. Grbovic, V. Radosavljevic, and N. Bhamidipati, "Hate Speech Detection with Comment Embeddings," in Proceedings of the 24th International Conference on World Wide Web, 2015, pp. 29–30, doi: 10.1145/2740908.2742760.

[8] C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, and Y. Chang, "Abusive Language Detection in Online User Content," in Proceedings of the 25th International Conference on World Wide Web, 2016, pp. 145–153, doi: 10.1145/2872427.2883062.

[9] H. Watanabe, M. Bouazizi, and T. Ohtsuki, "Hate Speech on Twitter : A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection," IEEE Access, vol. 6, pp. 13825–13835, 2018, doi: 10.1109/ACCESS.2018.2806394.

[10] P. Badjatiya, S. Gupta, M. Gupta, and V. Varma, "Deep Learning for Hate Speech Detection in Tweets," no. 2, 2017, [Online]. Available: https://dl.acm.org/doi/abs/10.1145/3041021.3054223.

[11] T. L. Sutejo and D. P. Lestari, "Indonesia Hate Speech Detection using Deep Learning," 2018 Int. Conf. Asian Lang. Process., pp. 39–43, 2018, [Online]. Available: https://ieeexplore.ieee.org/abstract/document/8629154/.

[12] B. Gambäck and U. K. Sikdar, "Using Convolutional Neural Networks to Classify Hate-Speech," in Proceedings of the First Workshop on Abusive Language Online, Aug. 2017, pp. 85–90, doi: 10.18653/v1/W17-3013.

[13] J. H. Park and P. Fung, "One-step and Two-step Classification for Abusive Language Detection on {{}T{}}witter," in Proceedings of the First Workshop on Abusive Language Online, Aug. 2017, pp. 41–45, doi: 10.18653/v1/W17-3006.

[14] Z. Zhang, D. Robinson, and J. Tepper, "Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network," 2018, [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-319-93417-4_48.

[15] S. Agrawal and A. Awekar, "Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms." 2018, [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-319-76941-7_11.

[16] I. Alfina, R. Mulia, M. I. Fanany, and Y. Ekanata, "Hate speech detection in the Indonesian language: A dataset and preliminary study," in 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS), 2017, pp. 233–238, doi: 10.1109/ICACSIS.2017.8355039.

[17] M. O. Ibrohim and I. Budi, "Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter," in Proceedings of the Third Workshop on Abusive Language Online, Aug. 2019, pp. 46–57, doi: 10.18653/v1/W19-3506.

[18] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, "Enriching Word Vectors with Subword Information," CoRR, vol. abs/1607.0, 2016, [Online]. Available: http://arxiv.org/abs/1607.04606.

[19] E. Grave, P. Bojanowski, P. Gupta, A. Joulin, and T. Mikolov, "Learning Word Vectors for 157 Languages," 2018, [Online]. Available: https://arxiv.org/abs/1802.06893.

[20] A. Akbik, D. Blythe, and R. Vollgraf, "Contextual String Embeddings for Sequence Labeling," in Proceedings of the 27th International Conference on Computational Linguistics, Aug. 2018, pp. 1638–1649, [Online]. Available: https://www.aclweb.org/anthology/C18-1139.

[21] A. Akbik, T. Bergmann, and R. Vollgraf, "Pooled Contextualized Embeddings for Named Entity Recognition," in Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jun. 2019, pp. 724–728, doi: 10.18653/v1/N19-1078.

[22] D. Tang, B. Qin, and T. Liu, "Document modeling with gated recurrent neural network for sentiment classification," in Proceedings of the 2015 conference on empirical methods in natural language processing, 2015, pp. 1422–1432, [Online]. Available: https://www.aclweb.org/anthology/D15-1167.pdf.

[23] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," arXiv Prepr. arXiv1412.3555, 2014, [Online]. Available: https://arxiv.org/abs/1412.3555.

[24] G. B. Herwanto, A. M. Ningtyas, K. E. Nugraha, and I. N. P. Trisna, "Hate speech and abusive language classification using fastText," in 2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), 2019, pp. 69–72, [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9034560/.

[25] A. Akbik, T. Bergmann, D. Blythe, K. Rasul, S. Schweter, and R. Vollgraf, “FLAIR: An easy-to-use framework for state-of-the-art NLP,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Demonstr. Sess., pp. 54–59, 2019, [Online]. Available: https://www.aclweb.org/anthology/N19-4010.pdf.

[26] Ž. Agić and I. Vulić, "JW300: A Wide-Coverage Parallel Corpus for Low-Resource Languages," in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Jul. 2019, pp. 3204–3210, doi: 10.18653/v1/P19-1310.

DOI: https://doi.org/10.22146/ijccs.64916

Article Metrics

Abstract views : 7538 |

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Copyright of :IJCCS (Indonesian Journal of Computing and Cybernetics Systems)ISSN 1978-1520 (print); ISSN 2460-7258 (online)is a scientific journal the results of Computingand Cybernetics Systems
A publication of IndoCEISS.Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281Fax: +62274 555133email:ijccs.mipa@ugm.ac.id | http://jurnal.ugm.ac.id/ijccs

View My Stats1View My Stats2

Username
Password
Remember me