Updating Frequent Itemsets Without Rescanning the Original Database in the Incremental Environment

碩士 === 國立臺灣科技大學 === 資訊工程系 === 97 === In today, Information is important and valuable, and it helps people makes the decision, such as market strategy. People depend on the useful information more and more. For this reason, people invent many methods that extract the useful formation from the huge da...

Full description

Bibliographic Details
Main Authors: Pai-Yu Lin, 林柏佑
Other Authors: Bi-Ru Dai
Format: Others
Language:en_US
Published: 2009
Online Access:http://ndltd.ncl.edu.tw/handle/25629406614318367928
Description
Summary:碩士 === 國立臺灣科技大學 === 資訊工程系 === 97 === In today, Information is important and valuable, and it helps people makes the decision, such as market strategy. People depend on the useful information more and more. For this reason, people invent many methods that extract the useful formation from the huge data. So, the technology of data mining is growing at rapid pace recently. Many helpful algorithms and applications are proposed in the recent years. Moreover, Researchers still try to develop efficient algorithms in this moment. Frequent pattern mining plays an important role in the data mining community since it is usually a fundamental step in various mining tasks. However, maintenance of frequent patterns is very expensive in the incremental database. In addition, the status of a pattern is changed with time. In other words, a frequent pattern is possible to become infrequent, and vice versa. In order to exactly find all frequent patterns, most algorithms have to scan the original database completely whenever an update occurs. In this work, we propose two new algorithms, iTM and ECEM. They mine frequent itemsets without rescanning the whole database in the incremental environment. These algorithms use the compressed structure, and quickly project the transaction dataset into this structure. We are able to preserve frequencies of all items, because our structure has a good compression ratio. Furthermore, these algorithms do not need rescanning the database when the user-defined threshold is changed. We also design several experiments to verify performances of our algorithms. Various transaction databases are used in our experiments. The results demonstrate that our algorithm can extract exact frequent itemsets from the transaction database, and these operations do not spend a lot of cost. In huge databases, we can obtain similar results, either. In this study, our algorithms reduce the cost in the step of scanning, and guarantee that the response time is acceptable.