A Weight-Order-Based Lattice Algorithm for Mining Maximal Weighted Frequent Patterns over a Data Stream Sliding Window

碩士 === 國立中山大學 === 資訊工程學系研究所 === 103 === Weighted frequent pattern mining in data streams is an important field for the real world, such as the supermarket. Moreover, mining the weighted maximal frequent patterns is also an important issue. The weighted maximal frequent pattern is the pattern which i...

Full description

Bibliographic Details
Main Authors: Tzung-je Chiou, 邱宗哲
Other Authors: Ye-In Chang
Format: Others
Language:en_US
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/t2y2cb
Description
Summary:碩士 === 國立中山大學 === 資訊工程學系研究所 === 103 === Weighted frequent pattern mining in data streams is an important field for the real world, such as the supermarket. Moreover, mining the weighted maximal frequent patterns is also an important issue. The weighted maximal frequent pattern is the pattern which is not the subset of any other pattern and the weighted support is larger than the threshold. However, many previous Apriori-like algorithms cannot be used in weighted frequent pattern mining. The reason is that even through a subset X of a pattern Y is not a weighted frequent pattern, the pattern Y may be a weighted frequent pattern. Besides, because data streams are continuous, high speed, unbounded, and real time, we can only scan once for the data streams. Therefore, the previous algorithms in the traditional databases are not suitable for the data streams. Furthermore, many applications are interested in the recent data streams, and the sliding window is the model which deals with the most recent data streams. In order to solve mining weighted maximal frequent patterns based on the sliding window model, Ryu et al. propose the WMFP-SW algorithm. The WMFP-SW algorithm uses the FP-tree to mine the weighted maximal frequent patterns. It also uses maximal weight to prune the patterns. But it takes long time in mining the weighted maximal frequent patterns. Because when the new transaction comes, the WMFP-SW algorithm always has to reconstruct the FP-tree. Moreover, the WMFP-SW algorithm may have a missing case. To solve those problems, in this thesis, we propose the Weighted-Order-Based Lattice algorithm based on the sliding window model. We use the lattice structure to store the information of the transactions. The structure of the lattice stores the relationship between the child node and the father node. In each node, we record the itemset and the count. When the new transaction comes, we consider five relations: (1) equivalent, (2) subset, (3) intersection, (4) empty set, (5) superset. With those five relations, we can add the new transactions and update the support efficiently. Moreover, we use global maximal weight pruning strategy and local maximal weight pruning strategy to avoid generating invalid candidate patterns. From the the performance study, including the real data and synthetic data, we show that theWeighted-Order-Based Lattice algorithm provides better performance than the WMFP-SW algorithm both in the case of real data and the case of simulation in both cases.