Summary: | 碩士 === 國立中山大學 === 資訊工程學系研究所 === 104 === Recently, mining high utility patterns has a huge range of applications such as retail market analysis and stock market prediction. The high utility pattern mining is an important research domain in data mining due to its capable of considering the different utility value and non-binary form quantity of each item. To address the limitation of traditional association rule mining, utility mining was defined. The traditional frequent patterns which are recognized by association rule mining does not reveal the impact factor, other than presence or absence of an item. The problem is that frequent patterns may only contribute a small amount of profit, whereas infrequent patterns may contribute a great deal of profit. Therefore, traditional association rule mining needs to expand its functionality to deal with this kind of task. By considering that each item has its individual price or value, utility patterns mining collects and analyzes data to identify the most contribution of the item. In order to perform high utility pattern mining, Yun et al. proposed the Maximum Utility Growth algorithm (MU-Growth) and Maximum Item Quantity tree (MIQ-Tree) to identify high utility patterns. The MU-Growth algorithm uses the overestimated utilities method and constructs their MIQ-Tree data structure. However, the MIQ-Tree has to be reconstructed by extracting every path of the transaction in the MIQ-Tree which will take long time to complete the mining of the whole tree if they contain lots of transactions. Moreover, the MU-Growth algorithm requires to scan database twice and often generates lots of candidates. To solve those problems, in this thesis, we propose the Inverted-File-High-Utility-Pattern-Tree data structure (IFHUP-Tree) and Inverted-File Pattern Growth algorithm to identify high utility patterns from the IFHUP-Tree without generating candidates. There are two types of relations which are concerned in our algorithm: (1) Transaction-Intersection (2) Non-Intersection. First, we concern the case of Transaction-Intersection, then the case of Non-Intersection. The IFHUP-Tree algorithm avoids the reconstructing procedure and generates the patterns with non-overestimated method to get the actual combination of high utility patterns. Our IFHUP-Tree algorithm requires only one database scan and one data structure to identify actual high utility patterns without generating any candidate. From the performance study, we show that our IFHUP-Tree algorithm is more efficient than MU-Growth algorithm both in real and synthetic datasets.
|