Hate Speech Detection for Indonesia Tweets Using Word Embedding And Gated Recurrent Unit
Junanda Patihullah(1*), Edi Winarko(2)
(1) Program Studi S2 Ilmu Komputer FMIPA UGM, Yogyakarta
(2) Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada
(*) Corresponding Author
Abstract
Social media has changed the people mindset to express thoughts and moods. As the activity of social media users increases, it does not rule out the possibility of crimes of spreading hate speech can spread quickly and widely. So that it is not possible to detect hate speech manually. GRU is one of the deep learning methods that has the ability to learn information relations from the previous time to the present time. In this research feature extraction used is word2vec, because it has the ability to learn semantics between words. In this research the GRU performance will be compared with other supervision methods such as support vector machine, naive bayes, decision tree and logistic regression. The results obtained show that the best accuracy is 92.96% by the GRU model with word2vec feature extraction. The use of word2vec in the comparison supervision method is not good enough from tf and tf-idf.
Keywords
Full Text:
PDFReferences
[1] G. A. Buntoro, Analisis Sentimen Hate Speech Pada Twitter Dengan Metode Naive Bayes Classifier dan Support Vector Machine, Jurnal Dinamika Informatika, volume 5, no.2, 2016 [Online]. Available: http://ojs.upy.ac.id/ojs/index.php/dinf/article/viewFile/975/775.
[2] P. Badjati, S. Gupta, M. Gupta and V. Varma, Deep Learning for hate Speech Detection in Tweets, Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759-760, doi: 10.1145/3041021.3054223, 2017.
[3] I. Alfina, R. Mulia, M. I. Fanany and Y. Ekanata, Hate Speech Detection in the Indonesian Language: A Dataset and Preliminary Study, 9th Int. Conf. Adv. Comput. Sci. Inf. Syst. (ICACSIS 2017), 2017 [Online]. Available: https://ieeexplore.ieee.org/document/8355039.
[4] M. M. Munir, M. A. Fauzi and R. S. Perdana, Implementasi Metode Backpropagation Neural Network berbasis Lexicon Based Features dan Bag Of Words Untuk Identifikasi Ujaran Kebencian pada Twitter, Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, volume 2, no.10, pp. 3182-3191, 2018 [Online]. Available: http://j-ptiik.ub.ac.id/index.php/ j-ptiik/article/view/2573.
[5] A. D. Rahmawan, Analisis Emosi Pada Tweet Berbahasa Indonesia Tentang Ulasan Film, Tesis, Program Studi S2 Ilmu Komputer, Universitas Gadjah Mada, Yogyakarta, 2018.
[6]M. Seok, H. Song, C. Park, J. Kim and Y. Kim, Named Entity Recognition Using Word Embeddings as a Feature, International Journal of Software Engineering and Its Application (IJSEIA), volume 10, no.2, pp. 93-104, 2016 [Online]. Available: http://www.sersc.org/journals/IJSEIA/vol10_no2_2016/8.pdf.
[7] Thanaki, Python Natural Language Processing, Packt Publishing, 2017.
[8] T. Mikolov, K. Chen, G. Carrado and J. Dean, Efficient estimation of word representations in vector space , arXiv preprint arXiv:1301.3781v3 [cs.CL], 2013 [Online]. Available: https://arxiv.org/abs/1301.3781 .
[9] R. Rana, J. Epps, R. Jurdak, X. Li, R. Geocke, M. Brereton and J. Soar, Gated Recurrent Unit (GRU) for Emotion Classification from Noisy Speech, arXiv preprint arXiv:1612.07778v1 [cs.HC], pp. 1-9, 2016 [Online]. Available: https://arxiv.org/abs/1612.07778 .
[10]J. lilleberg, Y. Zhu, Y. Zhang, Support Vector Machines and Word2vec for Text Classification with Semantic Feature, Proc.2015 IEEE 14th International Conference on Cognitive Informatics and Cognitive Computing [ICCI'CC'15], 2015 [Online]. Available: https://ieeexplore.ieee.org/document/7259377.
DOI: https://doi.org/10.22146/ijccs.40125
Article Metrics
Abstract views : 10620 | views : 6189Refbacks
- There are currently no refbacks.
Copyright (c) 2019 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
View My Stats1