Selection of the Best K-Gram Value on Modified Rabin-Karp Algorithm

Wahyu Hidayat(1*), Ema Utami(2), Andi Sunyoto(3)

(1) Magister Teknik Informatika, Universitas Amikom Yogyakarta, Yogyakarta
(2) Magister Teknik Informatika, Universitas Amikom Yogyakarta, Yogyakarta
(3) Magister Teknik Informatika, Universitas Amikom Yogyakarta, Yogyakarta
(*) Corresponding Author


The Rabin-Karp algorithm is used to detect similarity using hashing techniques, from related studies modifications have been made in the hashing process but in previous studies have not been conducted research for the best k value in the K-Gram process. At the stage of stemming the Nazief & Adriani algorithm is used to transform the words into basic words. The researcher uses several variations of K-Gram values to determine the best K-Gram values. The analysis was performed using Ukara Enhanced public data obtained from the Kaggle with a total of 12215 data. The student essay answers data totaled to 258 data in the group A and 305 in the group B, every student essay answers data in each group will be compared with the answers of other fellow group member. Research results are the value of k = 3 has the best performance which has the highest some interpretations of 1-14%  (Little degree of similarity) and 15-50% (Medium level of similarity) compared to values of k = 5, 7, and 9 which have the highest number of interpretation results 0%-0.99% (Document is different). However, if the students essay answers compared have 100% (Exactly the same) interpretations, the k value on K-Gram does not affect the results.


Similarity; Nazief & Adriani Stemming Algorithm; Rabin-Karp Algorithm; Dice Similarity Coefficient

Full Text:



A. N. Muhammad, S. Bukhori, and P. Pandunata, “Sentiment Analysis of Positive and Negative of YouTube Comments Using Naïve Bayes – Support Vector Machine ( NBSVM ) Classifier,” 2019 Int. Conf. Comput. Sci. Inf. Technol. Electr. Eng., vol. 1, pp. 199–205, 2019. Available: [Accessed: 28-Jan-2021]

S. Wahyu Handani, D. Intan Surya Saputra, Hasirun, R. Mega Arino, and G. Fiza Asyrofi Ramadhan, “Sentiment analysis for go-jek on google play store,” J. Phys. Conf. Ser., vol. 1196, no. 1, 2019. Available: [Accessed: 28-Jan-2021]

A. Amalia, D. Gunawan, Y. Fithri, and I. Aulia, “Automated Bahasa Indonesia essay evaluation with latent semantic analysis,” J. Phys. Conf. Ser., vol. 1235, no. 1, 2019. Available: [Accessed: 28-Jan-2021]

A. Jelita, “Effective Techniques for Indonesian Text Retrieval,” Ph.D Thesis, pp. 1–286, 2007.

A. Rahmatulloh, N. I. Kurniati, A. Z. Asyikin, I. Darmawan, and J. D. Witarsyah, “Comparison between the stemmer porter effect and nazief-adriani on the performance of winnowing algorithms for measuring plagiarism,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 9, no. 4, pp. 1124–1128, 2019. Available: [Accessed: 28-Jan-2021]

S. Andysah Putera Utama, M. Mesran, R. Robbi, and S. Dodi, “K-Gram As A Determinant Of Plagiarism Level In Rabin-Karp Algorithm,” Int. J. Sci. Technol. Res., vol. 6, no. 07, pp. 350–353, 2017. Available: [Accessed: 28-Jan-2021]

M. T. Pham and T. B. Nguyen, “The DOMJudge based online judge system with plagiarism detection,” RIVF 2019 - Proc. 2019 IEEE-RIVF Int. Conf. Comput. Commun. Technol., pp. 1–6, 2019. Available: [Accessed: 28-Jan-2021]

R. M. Harpreet Kaur, “Granularity-Based Assessment of Similarity Between Short Text Strings,” Proc. ofthe Third Int. Conf. Microelectron. Comput. Commun. Syst., pp. 91–107, 2019. Available: [Accessed: 28-Jan-2021]

D. Chang, M. Ghosh, S. K. Sanadhya, M. Singh, and D. R. White, “FbHash: A New Similarity Hashing Scheme for Digital Forensics,” Digit. Investig., vol. 29, pp. S113–S123, 2019. Available: [Accessed: 28-Jan-2021]

M. Misbah Musthofa and A. Yaqin, “Implementation of Rabin Karp algorithm for essay writing test system on organization XYZ,” 2019 Int. Conf. Inf. Commun. Technol. ICOIACT 2019, pp. 502–507, 2019. Available: [Accessed: 28-Jan-2021]

A. Agung Putri Ratna, D. Lalita Luhurkinanti, I. Ibrahim, D. Husna, and P. Dewi Purnamasari, “Automatic Essay Grading System for Japanese Language Examination Using Winnowing Algorithm,” Proc. - 2018 Int. Semin. Appl. Technol. Inf. Commun. Creat. Technol. Hum. Life, iSemantic 2018, pp. 565–569, 2018. Available: [Accessed: 28-Jan-2021]

B. Leonardo and S. Hansun, “Text documents plagiarism detection using Rabin-Karp and Jaro-Winkler distance algorithms,” Indones. J. Electr. Eng. Comput. Sci., vol. 5, no. 2, pp. 462–471, 2017. Available: [Accessed: 28-Jan-2021]

M. Bicer and X. Zhang, “An Efficient , Hybrid , Double-Hash String- Matching Algorithm,” 2019 IEEE Long Isl. Syst. Appl. Technol. Conf., pp. 1–5, 2019. Available: [Accessed: 28-Jan-2021]

D. D. Sinaga and S. Hansun, “Indonesian text document similarity detection system using rabin-karp and confix-stripping algorithms,” Int. J. Innov. Comput. Inf. Control, vol. 14, no. 5, pp. 1893–1903, 2018. Available: [Accessed: 28-Jan-2021]

M. I. Errissya Rasywir, Yovi Pratama, Hendrawan, “Removal of Modulo as Hashing Modification Process in Essay Scoring System Using Rabin-Karp,” 2018 Int. Conf. Electr. Eng. Comput. Sci., pp. 159–164, 2018. Available: [Accessed: 28-Jan-2021]

P. Sundari, S. Deepasamili, and C. Science, “PROGRESSIVE DUPLICATION DETECTION USING RABIN- KARP ALGORITHM,” Int. J. Res. Sci. Eng. Technol., vol. 3, no. 11, pp. 11–17, 2016. Available: [Accessed: 28-Jan-2021]

K. E. Rajakumari, “Comparison of Token-Based Code Clone Method with Pattern Mining Technique and Traditional String Matching Algorithms In-terms of Software Reuse,” Proc. 2019 3rd IEEE Int. Conf. Electr. Comput. Commun. Technol. ICECCT 2019, pp. 1–6, 2019. Available: [Accessed: 28-Jan-2021]

M. Afzali and S. Kumar, “Text Document Clustering: Issues and Challenges,” Proc. Int. Conf. Mach. Learn. Big Data, Cloud Parallel Comput. Trends, Prespectives Prospect. Com. 2019, pp. 263–268, 2019. Available: [Accessed: 28-Jan-2021]

B. N. Prastowo et al., “A Proposed Framework for Essay Answer Processing based on Computational,” Int. Conf. Educ. Assess. Policy, vol. 2, p. 54, 2018. Available: [Accessed: 28-Jan-2021]

R. Annisa, I. Surjandari, R. Annisa, and I. Surjandari, “Opinion Mining on Mandalika Hotel Reviews Using Latent Opinion Mining on Mandalika Hotel Reviews Using Latent Dirichlet Allocation Dirichlet Allocation,” Procedia Comput. Sci., vol. 161, pp. 739–746, 2019. Available: [Accessed: 28-Jan-2021]

A. T. Ni’mah, D. A. Suryaningrum, and A. Z. Arifin, “Autonomy Stemmer Algorithm for Legal and Illegal Affix Detection use Finite-State Automata Method,” EPI Int. J. Eng., vol. 2, no. 1, pp. 46–55, 2019. Available: [Accessed: 28-Jan-2021]

I. Obeidat and M. AlZubi, “Developing a faster pattern matching algorithms for intrusion detection system,” Int. J. Comput., vol. 18, no. 3, pp. 278–284, 2019. Available: [Accessed: 28-Jan-2021]

Riki, Edy, and Maryanto, “Plagiarism Detection Application Uses Winnowing Algorithm with Synonym Recognition for Indonesian Text Documents,” Selangor Sci. &Technology Rev., vol. 3, no. 1, pp. 1–14, 2019. Available: [Accessed: 28-Jan-2021]

J. H. T. Purba, M. Zarlis, and Sawaluddin, “THE IMPLEMENTATION OF N-GRAM FOR ESSAY Faculty of Computer Science and Information Technology , Universitas Sumatera Utara , Medan , Indonesia Faculty of Computer Science and Information Technology , Universitas Sumatera Utara , Medan , Indonesia,” vol. 7838, pp. 141–145, 2019. Available: [Accessed: 28-Jan-2021]

A. Bahrul Khoir, H. Qodim, B. Busro, and A. Rialdy Atmadja, “Implementation of rabin-karp algorithm to determine the similarity of synoptic gospels,” J. Phys. Conf. Ser., vol. 1175, no. 1, 2019. Available: [Accessed: 28-Jan-2021]


Article Metrics

Abstract views : 1927 | views : 1299


  • There are currently no refbacks.

Copyright (c) 2022 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Copyright of :
IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
ISSN 1978-1520 (print); ISSN 2460-7258 (online)
is a scientific journal the results of Computing
and Cybernetics Systems
A publication of IndoCEISS.
Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281
Fax: +62274 555133 |

View My Stats1
View My Stats2