Hate Speech Detection for Indonesia Tweets Using Word Embedding And Gated Recurrent Unit

https://doi.org/10.22146/ijccs.40125

Junanda Patihullah(1*), Edi Winarko(2)

(1) Program Studi S2 Ilmu Komputer FMIPA UGM, Yogyakarta
(2) Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada
(*) Corresponding Author

Abstract


Social media has changed the people mindset to express thoughts and moods. As the activity of social media users increases, it does not rule out the possibility of crimes of spreading hate speech can spread quickly and widely. So that it is not possible to detect hate speech manually. GRU is one of the deep learning methods that has the ability to learn information relations from the previous time to the present time. In this research feature extraction used is word2vec, because it has the ability to learn semantics between words. In this research the GRU performance will be compared with other supervision methods such as support vector machine, naive bayes, decision tree and logistic regression. The results obtained show that the best accuracy is 92.96% by the GRU model with word2vec feature extraction. The use of word2vec in the comparison supervision method is not good enough from tf and tf-idf.


Keywords


Gated Recurrent Unit;Hate Speech Detection;Word2vec;RNN;Word Embedding

Full Text:

PDF


References

[1] G. A. Buntoro, Analisis Sentimen Hate Speech Pada Twitter Dengan Metode Naive Bayes Classifier dan Support Vector Machine, Jurnal Dinamika Informatika, volume 5, no.2, 2016 [Online]. Available: http://ojs.upy.ac.id/ojs/index.php/dinf/article/viewFile/975/775.

[2] P. Badjati, S. Gupta, M. Gupta and V. Varma, Deep Learning for hate Speech Detection in Tweets, Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759-760, doi: 10.1145/3041021.3054223, 2017.

[3] I. Alfina, R. Mulia, M. I. Fanany and Y. Ekanata, Hate Speech Detection in the Indonesian Language: A Dataset and Preliminary Study, 9th Int. Conf. Adv. Comput. Sci. Inf. Syst. (ICACSIS 2017), 2017 [Online]. Available: https://ieeexplore.ieee.org/document/8355039.

[4] M. M. Munir, M. A. Fauzi and R. S. Perdana, Implementasi Metode Backpropagation Neural Network berbasis Lexicon Based Features dan Bag Of Words Untuk Identifikasi Ujaran Kebencian pada Twitter, Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, volume 2, no.10, pp. 3182-3191, 2018 [Online]. Available: http://j-ptiik.ub.ac.id/index.php/ j-ptiik/article/view/2573.

[5] A. D. Rahmawan, Analisis Emosi Pada Tweet Berbahasa Indonesia Tentang Ulasan Film, Tesis, Program Studi S2 Ilmu Komputer, Universitas Gadjah Mada, Yogyakarta, 2018.

[6]M. Seok, H. Song, C. Park, J. Kim and Y. Kim, Named Entity Recognition Using Word Embeddings as a Feature, International Journal of Software Engineering and Its Application (IJSEIA), volume 10, no.2, pp. 93-104, 2016 [Online]. Available: http://www.sersc.org/journals/IJSEIA/vol10_no2_2016/8.pdf.

[7] Thanaki, Python Natural Language Processing, Packt Publishing, 2017.

[8] T. Mikolov, K. Chen, G. Carrado and J. Dean, Efficient estimation of word representations in vector space , arXiv preprint arXiv:1301.3781v3 [cs.CL], 2013 [Online]. Available: https://arxiv.org/abs/1301.3781 .

[9] R. Rana, J. Epps, R. Jurdak, X. Li, R. Geocke, M. Brereton and J. Soar, Gated Recurrent Unit (GRU) for Emotion Classification from Noisy Speech, arXiv preprint arXiv:1612.07778v1 [cs.HC], pp. 1-9, 2016 [Online]. Available: https://arxiv.org/abs/1612.07778 .

[10]J. lilleberg, Y. Zhu, Y. Zhang, Support Vector Machines and Word2vec for Text Classification with Semantic Feature, Proc.2015 IEEE 14th International Conference on Cognitive Informatics and Cognitive Computing [ICCI'CC'15], 2015 [Online]. Available: https://ieeexplore.ieee.org/document/7259377.



DOI: https://doi.org/10.22146/ijccs.40125

Article Metrics

Abstract views : 10761 | views : 6266

Refbacks

  • There are currently no refbacks.




Copyright (c) 2019 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.



Copyright of :
IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
ISSN 1978-1520 (print); ISSN 2460-7258 (online)
is a scientific journal the results of Computing
and Cybernetics Systems
A publication of IndoCEISS.
Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281
Fax: +62274 555133
email:ijccs.mipa@ugm.ac.id | http://jurnal.ugm.ac.id/ijccs



View My Stats1
View My Stats2