Peningkatan Algoritma Porter Stemmer Bahasa Indonesia berdasarkan Metode Morfologi dengan Mengaplikasikan 2 Tingkat Morfologi dan Aturan Kombinasi Awalan dan Akhiran

  • Putu Bagus Susastra Wiguna Universitas Gadjah Mada
  • Bimo Sunarfri Hantono Universitas Gadjah Mada
Keywords: Stemmer, 2 tingkat morfologi, kombinasi awalan dan akhiran

Abstract

Stemmer has been used in document processing like: information retrieval, question answering, spell checking, language translator, document clustering, document classification. Stemmer method based on word morphology has some lack such as: incorrect prefix removal on root words beginning with the letter “k”, “t”, “s” and “p”, Incorrect suffix removal especially for “-kan” and “-an” suffix. To handle these problems, this research proposes a stemmer that uses two level morphology to root word beginning with the letter “k”, “t”, “s”, “p” and use prefix and suffix combination rules to remove suffix on a word. Example: “di-” as the prefix should only be paired with “kan-” as the suffix and should not be paired with “-an” as the suffix. The experiments showed that the proposed stemmer accuracy was 95.5%, better than the earlier stemmer based on word morphology. The accuracy of earlier stemmer based on word morphology was 82.5%.

References

J. B. Lovins, Development of a stemming algorithm. MIT Information Processing Group, Electronic Systems Laboratory, 1968.

D. Jurafsky and J. H. Martin, “Knowledge in Speech and Language Processing,” in Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Second Edition., Pearson-Prentice Hall, 2000.

F. Z. Tala, “A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia.” Master of Logic Project Institute for Logic, Language and Computation Universiteit van Amsterdam The Netherlands, 2003.

D. O. Baskoro, H. Malik, and M. H. Anshari, “PORTER STEMMER INFORMATION RETRIEVAL.” Computer Science Gadjah Mada University, 2012.

A. Purwarianti, “A non deterministic Indonesian stemmer,” in Electrical Engineering and Informatics (ICEEI), 2011 International Conference on, 2011, pp. 1–5.

F. Pisceldo, R. Mahendra, R. Manurung, and I. W. Arka, “A two-level morphological analyser for the indonesian language,” in Australasian Language Technology Association Workshop 2008, 2008, vol. 6, pp.142–150.

K. Koskenniemi, Two-Level Morphology: A General Computational Model for Word-Form Recognition and Production. University of Helsinki Department of General Linguistik Hallituskatu 11-13 SF-00100 Helsinki 10 Finland, 1983.

C. Silva and B. Ribeiro, “The importance of stop word removal on recall values in text categorization,” in Neural Networks, 2003. Proceedings of the International Joint Conference on, 2003, vol. 3, pp. 1661–1666.

How to Cite
Putu Bagus Susastra Wiguna, & Bimo Sunarfri Hantono. (1). Peningkatan Algoritma Porter Stemmer Bahasa Indonesia berdasarkan Metode Morfologi dengan Mengaplikasikan 2 Tingkat Morfologi dan Aturan Kombinasi Awalan dan Akhiran. Jurnal Nasional Teknik Elektro Dan Teknologi Informasi, 2(2), 1-6. Retrieved from https://journal.ugm.ac.id/v3/JNTETI/article/view/3137
Section
Articles