Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Top-Down and Bottom-Up Approach for Mining Multilevel Association Rules From Concept Hierarchical Data in Distributed Environment


Affiliations
1 Department of Information Technology, A.D. Patel Institute of Technology, India
     

   Subscribe/Renew Journal


Hierarchical Data mining using distributed environment is an imperative in big data analysis. Multilevel association rules can provide more substantial information than single level rules, and it also determines hierarchical knowledge from the dataset. Nowadays, numerous e-commerce and social networking sites generates vast amount of structural/semi-structural data in the form of sales data, tweets, text mails, web usages and so on. The data generated from such sources is so large that it becomes very difficult to process and analyze it using conventional approaches. This paper overcomes the computing limitation of single node by distributing the task on multi-node cluster. The performance of this system is compared based on minimum support threshold at diverse levels of concept hierarchy and by varying the dataset size. In this paper, the transactional dataset is created from huge sales dataset using Hadoop MapReduce framework. Then, two distributed multilevel frequent pattern mining algorithms MR-MLAB (MapReduce based Multilevel Apriori using Bottom-up approach) and MR-MLAT (MapReduce based Multilevel Apriori using Top-down approach) are implemented to find interesting level-crossing frequent itemset for each level of concept hierarchy. The hierarchical redundancy in multilevel association rules affects the quality of the market basket analysis. Hence, to improve the performance of the system, the hierarchical redundancy has to be removed from it. Finally, the time efficiency of proposed algorithms is compared with existing Traditional Multilevel Apriori (TMLA) Algorithm. The proposed algorithms with MapReduce framework are found efficient compared to the traditional algorithms.

Keywords

Distributed Frequent Pattern Mining, Multi-Level Association Rule, MapReduce, Level Crossing Rules
Subscription Login to verify subscription
User
Notifications
Font Size

  • K. Srikumar and B. Bhasker, “Metamorphosis: Mining Maximal Frequent Sets in Dense Domains”, International Journal of Artificial Intelligence Tools, Vol. 14, No. 3, pp. 491-506, 2005.
  • R. Agrawal, T. Imielinski and A. Swami, “Mining Association Rules between Sets of Items in Large Databases”, Proceedings of International Conference on ACM-SIGMOD on Management of Data, pp. 207-216, 1993.
  • J. Woo, S. Basopia and S.H. Kim, “Market Basket Analysis Algorithm with NoSQL DB HBase and Hadoop”, Proceedings of International Conference on Emerging Databases, pp. 56-62, 2011.
  • J. Woo, S. Basopia and S.H. Kim, “Market Basket Analysis Algorithm with MapReduce of Cloud Computing”, Proceedings of International Conference on Parallel and Distributed Processing Techniques and Applications, pp. 1-13, 2011.
  • F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows and R.E. Gruber, “Bigtable: A Distributed Storage System for Structured Data”, ACM Transactions on Computer Systems, Vol. 26, No. 2, pp. 1-14, 2008.
  • Apache Hadoop, Available at http://hadoop.apache.org/, Accessed at 2015.
  • J.H.C. Yeung, C.C. Tsang, K.H. Tsoi, B. Kwan, C. Cheung, A.P.C. Chan and P.H.W. Leong, “Map-Reduce as a Programming Model for Custom Computing Machines”, Proceedings of International Conference on Field-Programmable Custom Computing Machines, pp. 1-13, 2008.
  • R.A. Angryk and F.E. Petry, “Mining Multi-Level Associations with Fuzzy Hierarchies”, Proceedings of International Conference on Fuzzy System, pp. 785-790, 2005.
  • R.S. Thakur, R.C. Jain and K.R. Pardasani, “Mining Level-Crossing Association Rules from Large Databases”, Journal of Computer Science, Vol. 2, No. 1, pp. 76-81, 2006.
  • J. Han and Y. Fu, “Mining Multiple-Level Association Rules in Large Databases”, IEEE Transactions on Knowledge and Data Engineering, Vol. 11, No. 5, pp. 1-8, 1999.
  • G. Shaw, Y. Xu and S. Geva, “Eliminating Redundant Association Rules in Multilevel Datasets”, Proceedings of International Conference on Data Mining, pp. 14-17, 2008.
  • Y. Xu, G. Shaw and Y. Li, “Concise Representations for Association Rules in Multilevel Datasets”, Journal of Systems Science and Systems Engineering, Vol. 23, No. 1, pp. 53-70, 2009.
  • T. Hong, T. Huang and C. Chang, “Mining Multiple-level Association Rules Based on Pre-large Concepts”, Data Mining and Knowledge Discovery in Real Life Applications, pp. 187-200, 2009.
  • P. Gautam and K. R. Pardasani, “A Fast Algorithm for Mining Multilevel Association Rule Based on Boolean Matrix”, International Journal on Computer Science and Engineering, Vol. 2, No. 3, pp. 746-752, 2010.
  • S. Prakash, M. Vijayakumar, R.M.S. Parvathi, “A Novel Method of Mining Association Rule with Multilevel Concept Hierarchy”, International Journal of Computer Applications, Vol. 12, No. 1, pp. 26-29, 2011.
  • P. Gautam and K.R. Pardasani, “Efficient Method for Multiple-Level Association Rules in Large Databases”, Journal of Emerging Trends in Computing and Information Sciences, Vol. 2, No. 12, pp. 722-732, 2011.
  • S. Srivastava, H.K. Verma and D. Gupta, “On Performance Evaluation of Mining Algorithm for Multiple-Level Association Rules based on Scale-up Characteristics”, Journal of Advances in Information Technology, Vol. 2, No. 4, pp. 234-238, 2011.
  • P. Gautam and R. Shukla, “An Efficient Algorithm for Mining Multilevel Association Rule Based on Pincer Search”, International Journal of Computer Science Issues, Vol. 9, No. 4, pp. 235-241, 2012.
  • M.R. Karim, C.F. Ahmed, B. Jeong and H. Choi, “An Efficient Distributed Programming Model for Mining Useful Patterns in Big Datasets”, IETE Technical Review, Vol. 30, No. 1, pp. 53-63, 2013.
  • H. Zhuang and G. Wang, “Mining Multiple Level Association Rules under Weighted Concise Support Framework”, Computer Modelling and New Technologies, Vol. 18, No. 11, pp. 394-400, 2014.
  • A.K. Chandanan and M.K. Shukla, “Removal of Duplicate Rules for Association Rule Mining from Multilevel Dataset”, Proceedings of International Conference on Advanced Computing Technologies and Applications, pp. 143-149, 2015.
  • N. Pumjun and W. Kreesuradej, “Incremental Multilevel Association Rule Mining of a Dynamic Database Under a Change of a Minimum Support Threshold”, Advanced Multimedia and Ubiquitous Engineering, Vol. 34, pp. 87-94, 2016.
  • U. Muhammad and M. Usman, “Multi-Level Mining and Visualization of Informative Association Rules”, Journal of Information Sciences and Engineering, Vol. 32, pp. 1061-1078, 2016.
  • D.J. Prajapati and S. Garg, “MapReduce Based Multilevel Association Rule Mining from Concept Hierarchical Sales Data”, Proceedings of International Conference on Advances in Computing and Data Sciences, pp. 624-636, 2017.
  • D.J. Prajapati, S. Garg and N.C. Chauhan, “MapReduce based Multilevel Consistent and Inconsistent Association Rule Detection from Big Data Using Interestingness Measures”, Big Data Research, Vol. 9, pp. 18-27, 2017.
  • T. Ban, M. Eto, S. Guo, D. Inoue, K. Nakao and R. Huang, “A Study on Association Rule Mining of Darknet Big Data”, Proceedings of International Conference on Neural Network, pp. 1-7, 2015.
  • J. Han and M. Kamber, “Data Mining Concepts and Techniques”, Morgan Kaufmann Publishers, 2004.

Abstract Views: 76

PDF Views: 2




  • Top-Down and Bottom-Up Approach for Mining Multilevel Association Rules From Concept Hierarchical Data in Distributed Environment

Abstract Views: 76  |  PDF Views: 2

Authors

Dinesh J. Prajapati
Department of Information Technology, A.D. Patel Institute of Technology, India

Abstract


Hierarchical Data mining using distributed environment is an imperative in big data analysis. Multilevel association rules can provide more substantial information than single level rules, and it also determines hierarchical knowledge from the dataset. Nowadays, numerous e-commerce and social networking sites generates vast amount of structural/semi-structural data in the form of sales data, tweets, text mails, web usages and so on. The data generated from such sources is so large that it becomes very difficult to process and analyze it using conventional approaches. This paper overcomes the computing limitation of single node by distributing the task on multi-node cluster. The performance of this system is compared based on minimum support threshold at diverse levels of concept hierarchy and by varying the dataset size. In this paper, the transactional dataset is created from huge sales dataset using Hadoop MapReduce framework. Then, two distributed multilevel frequent pattern mining algorithms MR-MLAB (MapReduce based Multilevel Apriori using Bottom-up approach) and MR-MLAT (MapReduce based Multilevel Apriori using Top-down approach) are implemented to find interesting level-crossing frequent itemset for each level of concept hierarchy. The hierarchical redundancy in multilevel association rules affects the quality of the market basket analysis. Hence, to improve the performance of the system, the hierarchical redundancy has to be removed from it. Finally, the time efficiency of proposed algorithms is compared with existing Traditional Multilevel Apriori (TMLA) Algorithm. The proposed algorithms with MapReduce framework are found efficient compared to the traditional algorithms.

Keywords


Distributed Frequent Pattern Mining, Multi-Level Association Rule, MapReduce, Level Crossing Rules

References