Modification of Stemming Algorithm Using A Non Deterministic Approach To Indonesian Text

https://doi.org/10.22146/ijccs.49072

Wafda Rifai(1*), Edi Winarko(2)

(1) Master Program of Computer Science, FMIPA UGM, Yogyakarta
(2) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(*) Corresponding Author

Abstract


 Natural Language Processing is part of Artificial Intelegence that focus on language processing. One of stage in Natural Language Processing is Preprocessing. Preprocessing is the stage to prepare data before it is processed. There are many types of proccess in preprocessing, one of them is stemming. Stemming is process to find the root word from regular word. Errors when determining root words can cause misinformation. In addition, stemming process does not always produce one root word because there are several words in Indonesian that have two possibilities as root word or affixes word, e.g.the word “beruang”.

To handle these problems, this study proposes a stemmer with more accurate word results by employing a non deterministic algorithm which gives more than one word candidate result. All rules are checked and the word results are kept in a candidate list. In case there are several word candidates were found, then one result will be chosen.

This stemmer has been tested to 15.934 word and results in an accurate level of 93%. Therefore the stemmer can be used to detect words with more than one root word.


Keywords


stemming; non deterministik; accurate

Full Text:

PDF


References

[1]     D. Suhartono,"Lemmatization technique in bahasa: Indonesian," Journal of Software, Volume 9 No. 5, p.1203 Jakarta, 2014 [Online]. Available : https://www.researchgate.net/profile/Derwin_Suhartono/publication/273076749_Lemmatization_Technique_in_Bahasa_Indonesian_Language/links/58e866520f7e9b978f7f550e/Lemmatization-Technique-in-Bahasa-Indonesian-Language.pdf [Accessed : 25 August 2019]

[2]     D. Wahyudi, T. Susyanto, and D. Nugroho, “Implementasi dan Analisis Algoritma Stemming Nazief & Adriani dan Porter pada Dokumen Berbahasa Indonesia,” Jurnal Ilmiah Sinus, vol. 15, no. 2, pp. 49–56, Surakarta, 2017.

[3]     A.F Hidayatullah, "The Influence of Stemming on Indonesian Tweet Sentiment Analysis," Proceeding of International Conference on Electrical Engineering, Computer Science and Informatics, Palembang, 2015 [Online]. Available : http://journal.portalgaruda.org/index.php/EECSI/article/view/791/736 [Accessed : 20 August 2019]

[4]     S. S. Manase, “Studi Perbandingan Algoritma - Algoritma Stemming Untuk Dokumen Teks Bahasa Indonesia,”. Jurnal INKOFAR. Volume 1 No. 1, July 2017. Politeknik META. Bekasi, 2017 [Online]. Available : http://www.politeknikmeta.ac.id/meta/ojs/index.php/inkofar/article/view/2 [Accessed : 15 August 2019]

[5]     S. Prasetyo, “Komparasi Algoritme Stemming Nazief & Adriani Dengan Tala Pada Teks Bahasa Indonesia,” Tesis. Magister Teknik Informatika STMIK Amikom. Yogyakarta, 2016

[6]     D. Novitasari, "Perbandingan Algoritma Stemming Porter Denganarifin Setiono untuk Menentukan Tingkat Ketepatan Kata Dasar," Jurnal String Vol.1 No.2, Jakarta, 2016

[7]     R.K. Hapsari, Y.J. Santoso, “Stemming Artikel Berbahasa Indonesia dengan Pendekatan Confix-Stripping,” Prosiding Seminar Nasional Manajemen Teknologi XXII , 2015 [Online]. Available  : http://mmt.its.ac.id/download/SEMNAS/SEMNAS%20XXII/MTI/25.%20Prosiding%20Rinci%20Kembang%20Hapsari%20-%20Ok.pdf [Accessed : 20 August 2019]

[8]     R. Setiawan, A. Kurniawan, W. Budiharto, I. H. Kartowisastro and H. Prabowo, "Flexible affix classification for stemming Indonesian Language," 2016 13th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Chiang Mai, 2016 [Online]. Available : https://ieeexplore.ieee.org/abstract/document/7561257. [Accessed : 20 August 2019]

[9]     P. Prihatini, “Stemming Algorithm for Indonesian Digital News Text Processing,”.International Journal of Engineering and Emerging Technology. Bali, 2017 [Online]. Available : https://ojs.unud.ac.id/index.php/ijeet/article/view/36342. [Accessed : 15 August 2019]

[10]   A. Purwarianti, “A Non Deterministic Indonesian Stemmer,”. Proceedings of the 2011 International Conference on Electrical Engineering. Bandung, 2011 [Online]. Available : https://ieeexplore.ieee.org/document/6021829. [Accessed : 15 August 2019]

[11]   W. Hidayat, “Ekstraksi Kata Dasar Secara Berjenjang (Incremental Stemming) Berbasis Aturan Morfologi untuk Teks Berbahasa Indonesia,” Jurnal Infotel Vol 9 No 2. Purwokerto, 2017 [Online]. Available : http://ejournal.st3telkom.ac.id/index.php/infotel/article/view/216. [Accessed : 15 August 2019]

[12] A. Heryana, “Uji McNemar dan Uji Peringkat Bertanda Wilcoxon data berpasangan”. Materi Kuliah, Universitas Esa Unggul. Jakarta. 2017 [Online], Available : https://docplayer.info/47771598-Uji-mcnemar-dan-uji-peringkat-bertanda-wilcoxon-data-berpasangan-ade-heryana-sst-mkm.html [Accessed : 20 October 2019]



DOI: https://doi.org/10.22146/ijccs.49072

Article Metrics

Abstract views : 109 | views : 67

Refbacks

  • There are currently no refbacks.




Copyright (c) 2019 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.



Copyright of :
IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
ISSN 1978-1520 (print); ISSN 2460-7258 (online)
is a scientific journal the results of Computing
and Cybernetics Systems
A publication of IndoCEISS.
Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281
Fax: +62274 555133
email:ijccs.mipa@ugm.ac.id | http://jurnal.ugm.ac.id/ijccs



View My Stats1
View My Stats2