The MapReduce Model on Cascading Platform for Frequent Itemset Mining

Nur Rokhman; Amelia Nursanti

doi:10.22146/ijccs.34102

The MapReduce Model on Cascading Platform for Frequent Itemset Mining

https://doi.org/10.22146/ijccs.34102

Nur Rokhman^(1*), Amelia Nursanti⁽²⁾

(1) Department of Electronics and Computer Science, FMIPA UGM, Yogyakarta
(2) Computer Science Study Program FMIPA UGM
(*) Corresponding Author

Abstract

The implementation of parallel algorithms is very interesting research recently. Parallelism is very suitable to handle large-scale data processing. MapReduce is one of the parallel and distributed programming models. The implementation of parallel programming faces many difficulties. The Cascading gives easy scheme of Hadoop system which implements MapReduce model.

Frequent itemsets are most often appear objects in a dataset. The Frequent Itemset Mining (FIM) requires complex computation. FIM is a complicated problem when implemented on large-scale data.

This paper discusses the implementation of MapReduce model on Cascading for FIM. The experiment uses the Amazon dataset product co-purchasing network metadata.The experiment shows the fact that the simple mechanism of Cascading can be used to solve FIM problem. It gives time complexity O(n), more efficient than the nonparallel which has complexity O(n²/m).

Keywords

Frequent Itemset Mining; MapReduce; Cascading

Full Text:

PDF

References

[1] X. Jiang and G. Sun, “MapReduce-based Frequent Itemset Mining for Analysis of Electronic Evidence ”, Eight International Workshop on Systematic Approaches to Digital Forensic Engineering (SADFE), 2013. Available: http://ieeexplore.ieee.org/document/6911549/ [Accessed: 19-Mar-2018].

[2] X. Li, “An Algorithm for Mining Frequent Itemsets from Library Big Data ”, Journal of Software, Vol. 9, No. 9, September 2014. Available: http://www.jsoftware.us/vol9/jsw0909-18.pdf [Accessed: 19-Mar-2018].

[3] O. Yahya, O. Hegazy, and E. Ezat, “An Efficient Implementation of Apriori Algorithm Based on Hadoop-MapReduce Model”, International Journal of Reviews in Computing, 31^st Dec 2012, Vol.12, pp.59-67 [Online]. Available: http://www.ijric.org/volumes/Vo12/Vol12No7.pdf [Accessed:19-Feb-2018].

[4] G.P. Chen, Y. B. Yang, and Y. Zhang, “MapReduce-based Balanced Mining for Closed Frequent Itemset”, IEEE 19th International Conference on Web Services, 2012. Available : http://ieeexplore.ieee.org/document/6257941/ [Accessed: 19-Mar-2018].

[5] T. Ramakrishnudu and R.B.V. Subramanyam, “Mining Interesting Infrequent Itemsets from Very Large Data based on MapReduce Framework”, I.J. Intelligent Systems and Applications, 2015, 07, 44-49, Published Online June 2015 in MECS (http://www.mecs-press.org/). Available: http://www.mecs-press.org/ijisa/ijisa-v7-n7/IJISA-V7-N7-6.pdf [Accessed: 19-Mar-2018].

[6] Y. H. Liang and S.Y. Wu, “Sequence-Growth : A Scalable and Effective Frequent Itemset Mining Algorithm for Big Data Based on MapReduce Framework ”, IEEE International Congress on Big Data, 2015. Available: http://ieeexplore.ieee.org/document/7207249/ [Accessed: 19-Mar-2018].

[7] C. V. Suneel, K. Prasanna, and M.R. Kumar, “Frequent Data Partitioning using Parallel Mining Item Sets and MapReduce”, International Journal of Scientific Research in Computer Science, Engineering and Information Technology, Volume 2 | Issue 4 |, 2017. Available: http://ijsrcseit.com/CSEIT1724152 [Accessed: 19-Mar-2018].

[8] B. He, H. Zhang and J. Pei, “The Mining Algorithm of Frequent Itemsets based on Mapreduce and FP-tree”, International Conference on Computer Network, Electronic and Automation, 2017. Available: https://www.computer.org/csdl/proceedings/iccnea/2017/3981/00/3981a108.pdf [Accessed: 19-Mar-2018].

[9] F. Kovacs and J. Illes, “Frequent Itemset Mining on Hadoop”, IEEE 9th International Conference on Computational Cybernetics (ICCC 2013), July 8-10, 2013, pp.241–245. Available: http://ieeexplore.ieee.org/document/6617596/ [Accessed: 19-Mar-2018].

[10] H. Chaudhary, “MapReduce Based Frequent Itemset Mining Algorithm on Stream Data”, Global Conference on Communication Technologies (GCCT 2015), pp.598–603, 2015. Available: http://ieeexplore.ieee.org/document/7342732/ [Accessed: 19-Mar-2018].

[11] S. Saha and M. S. I. Islam, “Comparative Analysis of Mapreduce Framework for Efficient Frequent Itemset Mining in Social Network Data”, Global Journal of Computer Science and Technology Cloud and Distributed, 2016, Volume 16 Issue 3. Available : https://globaljournals.org/GJCST_Volume16/7-Comparative-Analysis-of-Mapreduce.pdf [Accessed: 19-Mar-2018].

[12] M.A. Shinde and K.P. Adhiya, “Frequent Itemset Mining Algorithms for Big Data using MapReduce Technique - A Review”, International Conference on Global Trends in Engineering, Technology and Management (ICGTETM-2016), 2016. Available: http://www.ijettjournal.org/Special%20issue/ICGTETM-2016/ICGTETM_2016_paper_131.pdf [Accessed: 19-Mar-2018].

[13] A. Padmapriya and R. Venkatachalam, “Collaborative-Frequent Itemset Mining of Big Data Using Mapreduce Framework”, International Journal of Computer Science and Engineering (NCSACT–2017), 2017. Available: http://www.internationaljournalssrg.org/IJCSE/2017/Special-Issues/NCSACT/IJCSE-NCSACT-P119.pdf [Accessed: 19-Mar-2018].

[14] S. Tribhuvan and B.P. Vasgi, “Parallel Frequent Itemset Mining for Big Datasets using Hadoop-MapReduce Paradigm”, International Journal of Advanced Research in Computer and Communication Engineering, Vol. 6, Issue 6, June 2017. Available: https://www.ijarcce.com/upload/2017/june-17/IJARCCE%2035.pdf [Accessed: 19-Mar-2018].

[15] A. N. Nandakumar and N. Yambem, “A Survey on Data Mining Algorithms on Apache Hadoop Platform”, International Journal of Emerging Technology and Advanced Engineering. Vol. 4, Issue 1, January 2014. Available: http://www.ijetae.com/files/Volume4Issue1/IJETAE_0114_95.pdf [Accessed: 19-Mar-2018].

[16] M. R. Ghazia and D. Gangodkara, “Hadoop, MapReduce and HDFS: A Developers Perspective”, International Conference on Intelligent Computing, Communication & Convergence (ICCC-2015), 2015. Available: https://www.researchgate.net/publication/277935711_Hadoop_MapReduce_and_HDFS_a_developers_perspective [Accessed: 19-Mar-2018].

[17] ----, “Cascading 2 User Guide”, Concurrent, Inc., Publication date October 2012. Available: http://docs.cascading.org/cascading/2.0/userguide/pdf/userguide.pdf [Accessed: 19-Mar-2018].

[18] S. Perera and T. Gunarathne, “Hadoop MapReduce Cookbook”, February 2013, Packt Publishing Ltd.

[19] T. White, “ Hadoop: The Definitive Guide”, 2015, O’Reilly Media, Inc.

[20] P. Nathan, “Enterprise Data Workflows with Cascading”, 2013, O'Reilly Media, Inc.

[21] http://snap.stanford.edu/data/#amazon[Accessed: 19-Mar-2018].

[22] A. Nursanti, “Frequent Itemset Finding Based On Mapreduce Using Cascading Platform”, 2017. Available : http://etd.repository.ugm.ac.id/index.php?mod=penelitian_detail&sub=PenelitianDetail&act=view&typ=html&buku_id=107287&obyek_id=4 [Accessed: 19-Mar-2018].

DOI: https://doi.org/10.22146/ijccs.34102

Article Metrics

Abstract views : 4563 |

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Copyright of :IJCCS (Indonesian Journal of Computing and Cybernetics Systems)ISSN 1978-1520 (print); ISSN 2460-7258 (online)is a scientific journal the results of Computingand Cybernetics Systems
A publication of IndoCEISS.Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281Fax: +62274 555133email:ijccs.mipa@ugm.ac.id | http://jurnal.ugm.ac.id/ijccs

View My Stats1View My Stats2

Username
Password
Remember me