Peringkasan Sentimen Esktraktif di Twitter Menggunakan Hybrid TF-IDF dan Cosine Similarity

Devid Haryalesmana Wahid(1*), Azhari SN(2)

(1) Universitas Gadjah Mada
(2) Departemen Ilmu Komputer dan Elektronika, FMIPA UGM, Yogyakarta
(*) Corresponding Author


The using of Twitter by selebrities has become a new trend of impression management strategy. Mining public reaction in social media is a good strategy to obtain feedbacks, but extracting it are not trivial matter. Reads hundred of tweets while determine their sentiment polarity are time consuming. Extractive sentiment summarization machine are needed to address this issue. Previous research generally do not include sentiment information contained in a tweet as weight factor, as a results only general topics of discussion are extracted.

This research aimed to do an extractive sentiment summarization on both positive and negative sentiment mentioning Indonesian selebrity, Agnes Monica, by combining SentiStrength, Hybrid TF-IDF, and Cosine Similarity. SentiStrength is used to obtain sentiment strength score and classify tweet as a positive, negative or neutral. The summarization of posisitve and negative sentiment can be done by rank tweets using Hybrid TF-IDF summarization and sentiment strength score as additional weight then removing similar tweet by using Cosine Similarity.

The test results showed that the combination of SentiStrength, Hybrid TF-IDF, and Cosine Similarity perform better than using Hybrid TF-IDF only, given an average 60% accuracy and 62% f-measure. This is due to the addition of sentiment score as a weight factor in sentiment summ­ari­zation.


extractive sentiment summarization, sentiment analysist, classification, automatic text summarization, SentiStrength, Hybrid TF-IDF

Full Text:



Alim, C. A., 2015, Impression Management Agnes Monica Melalui Akun Instagram (@agnezmo), Jurnal e-Komunikasi, vol. 2, no. 3.

Pang, B. dan Lee, L., 2008, Opinion mining and sentiment analysis, Foundations and trends in information retrieval, vol. 2, no. 1-2, pp. 1-135.

Riandaru, V., 2016, Penggolongan Program Sinetron Berdasarkan Opini Masyarakat di Twitter dengan Cosine Similarity, Tesis, Master of Computer Science, Universitas Gadjah Mada, Indonesia.

Thelwall, M., Buckley, K., Paltoglou, G., Cai, D. & Kappas, A., 2010, Sentiment Strength Detection in Short Informal Text, Journal of the American Society for Information Science and Technology, 61(12), 2544–2558.

Thelwall, M., Buckley, K. & Paltoglou, G., 2012, Sentiment Strength Detection for the Social Web, Journal of the American Society for Information Science and Technology, 63(1), 163-173.

Norman, G. J., Norris, C., Gollan, J., Ito, T., Hawkley, L., Larsen, J., Berntson, G. G., 2011, Current emotion research in psychophysiology: The neurobiology of evaluative bivalence, Emotion Review, 3, 3349-359.

Sharifi, B., Hutton, M. A. & Kalita, J. K., 2010, Experiments in Microblog Summarization, Social Computing (SocialCom), 2010 IEEE Second International Conference, IEEE, 49-56

Sharifi, B. P., Inouye, D. I. & Kalita, J. K., 2013, Summarization of Twitter Microblogs, The Computer Journal, bxt109.

Imbar, R.V., Adelia., Ayub, M., dan Rehatta, A., 2014, Implementasi Cosine Similarity dan Algoritma Smith-Waterman untuk Mendeteksi Kemiripan Teks, Jurnal Informatika, Vol.10, No.1, pp.31-42.

Han, J dan Kamber, M., 2006, Data Mining:Concepts and Technique 2nd Edition, Morgon Kauffman Publisher, San Fransisco.

Pratama, F., 2014, Rancang Bangun Aplikasi Peringkas Teks Otomatis Artikel Berbahasa Indonesia Menggunakan Metode Term Frequency Inverse Document Frequency (TF-IDF) dan K-Mean Clustering, Skrips, Jurusan Teknik Informatika, Fakultas Sains dan Teknologi, Universitas Islam Negeri Sultan Syarif Kasim, Pekanbaru.


Article Metrics

Abstract views : 14798 | views : 14943


Copyright (c) 2016 IJCCS - Indonesian Journal of Computing and Cybernetics Systems

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Copyright of :
IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
ISSN 1978-1520 (print); ISSN 2460-7258 (online)
is a scientific journal the results of Computing
and Cybernetics Systems
A publication of IndoCEISS.
Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281
Fax: +62274 555133 |

View My Stats1
View My Stats2