Open Access Open Access  Restricted Access Subscription Access

CIP- Efficient Method for Mining Frequent Itemsets From Data Streams Using Landmark Window Model


Affiliations
1 Department of Computer Applications, St. Xavier’s Catholic College of Engineering, Chunkankadai - 03., India
 

Continuous stream transactions like network monitoring, retail market data analysis and stock market prediction need the “frequent patterns” to be detected recurrently. Literature suggests that several pattern mining solutions are being developed over years. Still lot of challenges need to be addressed due to rapidness in generation of continuous, unbounded and ordered data real time. Hence extraction of frequent patterns from recent data will improve the analysis of stream data. In this article, a new landmark window model CIP (candidate indexing and pruning) is considered for mining the datasets. CIP allows us to mine over entire history of data streams, which improves the accuracy. This article also proposes the candidate indexed sub (CIS)-tree scheme to extract the essential information from each incoming transactions of data streams. Our proposal is compared with the existing “improved data stream mining” (ISDM) for maximal frequent itemsets algorithm. Extensive experimental analyses prove the superiority of the proposed CIP over the popular ISDM in terms of accuracy and time complexity for high-speed data stream. This article also covers up a case study where the proposed approach is applied for an application called “web prefetching”.

Keywords

Data Streams, Frequent Itemsets, Pruning, Frequent Patterns, Web Prefetching.
User
Notifications
Font Size

  • Agrawal R, Srikant R (1994) Fast Algorithms for Mining Association Rules. In Proc. of VLDB, pp 487- 499
  • Agrawal R, Srikant R (1995) Mining Sequential Patterns. In Proc. of IDCE, pp 3-14
  • Liu B, Hsu W, Ma.Y (1998) Integrating Classification and Association Rule Mining. In Proc. of KDD
  • Wang H, Yang J, Wang W, Yu PS (2002) Clustering by Pattern Similarity in Large Datasets. In Proc. of SIGMOD, pp 394-405
  • Vimal Kumar D, Tamilarasi A (2013) An effective approach to mine relational patterns and its extensive analysis on multi-relational databases Int. J. of Data Mining, Modelling and Management, Vol.5, No.3, pp.277 - 297
  • Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. Proceedings of PODS, pp 1-16
  • Graham Cormode, Muthukrishnan S (2005) What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically. ACM Transactions on Database Systems 30:249-278
  • Golab L, Ozsu MT (2003) Issues in data stream management. SIGMOD 32: 5-14
  • Jun Tan, Yingyong BU and Haiming Zhao (2010) Efficient Single-pass Frequent Itemsets Mining over Data Streams. Seventh IEEE International Conference on Fuzzy Systems and Knowledge Discovery, pp 1438-1431
  • Chang, Lee, Zhou (2003) Finding Recent Frequent Itemsets Adaptively over online Data Streams. ACM SIGKDD International Conference on knowledge Discovery and Data Mining, pp 487-492
  • Lukasz Golab, Theodore Johnson, and VladislavShkapenyuk (2012) Scalable Scheduling of Update in Streaming Data Warehouses. IEEE Transactions on Knowledge and Data Engineering 24:1095-1105
  • Nan Jiang, Le Gruenwald (2006) Research issues in Data Stream Association Rule Mining. SIGMOID Record, 35:1 [13] Sotiris Kotsiantis, DimitrisKanellopoulos (2006)
  • Association Rules Mining: A Recent Overview. GESTS International Transactions on Computer Science and Engineering, 32: 91-82
  • Li H, Lee S, Shan M(2004) An Efficient Algorithm for Mining Frequent Itemsets over Entire History of Data Streams. In Proc. of First International Workshop on Knowledge Discovery in Data Streams
  • Wang J, Han J, Pei J (2003) CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets. In Proc. of KDD, pp 236-245
  • Chang, Lee (2005) A sliding window method for finding recently frequent itemsets over online data streams. Journal of Information Science and Engineering pp 76-90
  • Chi Y, Wang H, Yu PS, Muntz RR (2004) Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window. In Proc. of ICDM, pp 59-66
  • Chih-hsiang Lin, Ding-ying Chiu, Yi-hung Wu (2005) Mining frequent itemsets from data streams with a time sensitive sliding window. SIAM International Conference on Data Mining, pp 486- 491
  • Dawar S, Sharma V, Goyal V, (2017) Mining top-k high-utility itemsets from a data stream under sliding window model, Applied Intelligence, 47(4), pp 1240–1255
  • Chang Y-I, Li C-E, Chou T-J, Yen C-Y (2018) A weight-order-based lattice algorithm for mining maximal weighted frequent patterns over a data stream sliding window, 2018 IEEE International Conference on Applied System Invention (ICASI), Chiba, Japan, 13-17 April 2018
  • Kuen-Fang Jea, Chao-Wei Li, Tsui-ping Chang (2008) An efficient approximate approach to mining frequent itemsets over high speed transactional data streams. IEEE Eight International Conference on Intelligent Systems Design and Applications, pp 275-280
  • Bo Li (2009) Finding Frequent Itemsets from Uncertain Transaction Streams. IEEE International Conference on Artificial Intelligence and Computational Intelligence, pp 331-335
  • Li, A., Xu, W., Liu, Z. et al(2021). Improved incremental local outlier detection for data streams based on the landmark window model. KnowlInfSyst 63, 2129–2155.
  • Kolomvatsos K and Anagnostopoulos C (2021), "Landmark based Outliers Detection in Pervasive Applications," 2021 12th International Conference on Information and Communication Systems (ICICS), 2021, pp. 201-206.
  • Lee D, Lee W(2005) Finding Maximal Frequent Itemsets over Online Data Streams Adaptively. In Proc. of ICDM, pp 1550-1505
  • Chang JH, Lee WS (2003) estWin: Adaptively Monitoring the Recent Change of Frequent Itemsets over Online Data Streams. In Proc. of CIKM, pp 536-539
  • Chernoff H (1952) A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the Sum of Observations. The Annals of Mathematical Statistics 23:493-507
  • Yu J, Chong Z, Lu H, Zhou A (2004) False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams. In Proc. of VLDB, pp 204-215
  • Chang JH, Lee WS (2003) Finding Recent Frequent Itemsets Adaptively over online Data Streams. In Proc. of KDD, pp 753-762
  • Giannella C, Han J, Pei J, Yan X, Yu PS (2003) Mining Frequent Patterns in Data Streams at Multiple Time Granularities. H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha (eds.) Next Generation Data Mining
  • Chen Y, Dong G, Han J, Wah B.W, Wang J (2002) Multidimensional Regression Analysis of Time- Series Data Streams. In Proc. of VLDB, pp 323-334
  • Gouda K, Zaki M (2001) Efficiently Mining Maximal Frequent Itemsets. In Proc. of ICDM
  • ToonCalders, NeleDexters, Bart Goethals (2008) Mining Frequent Itemsets in a Stream. Seventh IEEE International Conference on Data Mining, pp 83-92
  • RenJiadong, He Huiling, XuLina, Hu Changzhen (2009) DSMFI-Miner : An Algorithm for Mining Maximal Frequent Itemsets on Data Streams. IEEE Second International Workshop on Computer Science and Engineering, pp 139- 143
  • Alfredo Cuzzocrea, Fan Jiang, Wookey Lee, Carson K.Leung (2014) Efficient frequent Itemset Mining from Dense Data Streams. APWeb, Springer, (LNCS 8709), pp 593-601
  • Luigi Troiano, G. Scibelli (2013) A timeefficient breadth-first level-wise lattice-traversal algorithm to discover rare itemsets. Data Min. Knowl. Disc., Springer 27:1-35
  • Luigi Troiano, GiacomoScibelli (2014) Mining frequent itemsets in data streams within a time horizon. Data & Knowledge Engineering, Elsevier, 89:21-37
  • Hongjun Lu, YuetYeung Ng, ZenpingTian (2000) T-Tree or B-tree: main memory database index structure revisited. 11th IEEE Australasian database conference, pp 65-73
  • Kong Rim Choi, Kyung-Chang Kim (1996) T*- tree: a main memory database index structure for real time applications. IEEE workshop on real time computing systems and applications, 81-88
  • Yinmin Mao, Hong Li, Lumin Yang, Zhigang Chen, Lixin Liu (2009) A Mining Maximal FrequentItemsets over the Entire History of Data Streams. Proceeding of the First IEEE International Workshop on Database Technology and Applications, pp 413-419

Abstract Views: 69

PDF Views: 0




  • CIP- Efficient Method for Mining Frequent Itemsets From Data Streams Using Landmark Window Model

Abstract Views: 69  |  PDF Views: 0

Authors

F. Ramesh Dhanaseelan
Department of Computer Applications, St. Xavier’s Catholic College of Engineering, Chunkankadai - 03., India
M. JeyaSutha
Department of Computer Applications, St. Xavier’s Catholic College of Engineering, Chunkankadai - 03., India

Abstract


Continuous stream transactions like network monitoring, retail market data analysis and stock market prediction need the “frequent patterns” to be detected recurrently. Literature suggests that several pattern mining solutions are being developed over years. Still lot of challenges need to be addressed due to rapidness in generation of continuous, unbounded and ordered data real time. Hence extraction of frequent patterns from recent data will improve the analysis of stream data. In this article, a new landmark window model CIP (candidate indexing and pruning) is considered for mining the datasets. CIP allows us to mine over entire history of data streams, which improves the accuracy. This article also proposes the candidate indexed sub (CIS)-tree scheme to extract the essential information from each incoming transactions of data streams. Our proposal is compared with the existing “improved data stream mining” (ISDM) for maximal frequent itemsets algorithm. Extensive experimental analyses prove the superiority of the proposed CIP over the popular ISDM in terms of accuracy and time complexity for high-speed data stream. This article also covers up a case study where the proposed approach is applied for an application called “web prefetching”.

Keywords


Data Streams, Frequent Itemsets, Pruning, Frequent Patterns, Web Prefetching.

References