Updating Frequent Itemsets Without Rescanning the Original Database in the Incremental Environment

碩士 === 國立臺灣科技大學 === 資訊工程系 === 97 === In today, Information is important and valuable, and it helps people makes the decision, such as market strategy. People depend on the useful information more and more. For this reason, people invent many methods that extract the useful formation from the huge da...

Full description

Bibliographic Details
Main Authors: Pai-Yu Lin, 林柏佑
Other Authors: Bi-Ru Dai
Format: Others
Language:en_US
Published: 2009
Online Access:http://ndltd.ncl.edu.tw/handle/25629406614318367928
id ndltd-TW-097NTUS5392054
record_format oai_dc
spelling ndltd-TW-097NTUS53920542016-05-02T04:11:39Z http://ndltd.ncl.edu.tw/handle/25629406614318367928 Updating Frequent Itemsets Without Rescanning the Original Database in the Incremental Environment 以不需重新存取資料庫的方式來有效探勘動態資料庫中的頻繁項目集 Pai-Yu Lin 林柏佑 碩士 國立臺灣科技大學 資訊工程系 97 In today, Information is important and valuable, and it helps people makes the decision, such as market strategy. People depend on the useful information more and more. For this reason, people invent many methods that extract the useful formation from the huge data. So, the technology of data mining is growing at rapid pace recently. Many helpful algorithms and applications are proposed in the recent years. Moreover, Researchers still try to develop efficient algorithms in this moment. Frequent pattern mining plays an important role in the data mining community since it is usually a fundamental step in various mining tasks. However, maintenance of frequent patterns is very expensive in the incremental database. In addition, the status of a pattern is changed with time. In other words, a frequent pattern is possible to become infrequent, and vice versa. In order to exactly find all frequent patterns, most algorithms have to scan the original database completely whenever an update occurs. In this work, we propose two new algorithms, iTM and ECEM. They mine frequent itemsets without rescanning the whole database in the incremental environment. These algorithms use the compressed structure, and quickly project the transaction dataset into this structure. We are able to preserve frequencies of all items, because our structure has a good compression ratio. Furthermore, these algorithms do not need rescanning the database when the user-defined threshold is changed. We also design several experiments to verify performances of our algorithms. Various transaction databases are used in our experiments. The results demonstrate that our algorithm can extract exact frequent itemsets from the transaction database, and these operations do not spend a lot of cost. In huge databases, we can obtain similar results, either. In this study, our algorithms reduce the cost in the step of scanning, and guarantee that the response time is acceptable. Bi-Ru Dai 戴碧如 2009 學位論文 ; thesis 61 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣科技大學 === 資訊工程系 === 97 === In today, Information is important and valuable, and it helps people makes the decision, such as market strategy. People depend on the useful information more and more. For this reason, people invent many methods that extract the useful formation from the huge data. So, the technology of data mining is growing at rapid pace recently. Many helpful algorithms and applications are proposed in the recent years. Moreover, Researchers still try to develop efficient algorithms in this moment. Frequent pattern mining plays an important role in the data mining community since it is usually a fundamental step in various mining tasks. However, maintenance of frequent patterns is very expensive in the incremental database. In addition, the status of a pattern is changed with time. In other words, a frequent pattern is possible to become infrequent, and vice versa. In order to exactly find all frequent patterns, most algorithms have to scan the original database completely whenever an update occurs. In this work, we propose two new algorithms, iTM and ECEM. They mine frequent itemsets without rescanning the whole database in the incremental environment. These algorithms use the compressed structure, and quickly project the transaction dataset into this structure. We are able to preserve frequencies of all items, because our structure has a good compression ratio. Furthermore, these algorithms do not need rescanning the database when the user-defined threshold is changed. We also design several experiments to verify performances of our algorithms. Various transaction databases are used in our experiments. The results demonstrate that our algorithm can extract exact frequent itemsets from the transaction database, and these operations do not spend a lot of cost. In huge databases, we can obtain similar results, either. In this study, our algorithms reduce the cost in the step of scanning, and guarantee that the response time is acceptable.
author2 Bi-Ru Dai
author_facet Bi-Ru Dai
Pai-Yu Lin
林柏佑
author Pai-Yu Lin
林柏佑
spellingShingle Pai-Yu Lin
林柏佑
Updating Frequent Itemsets Without Rescanning the Original Database in the Incremental Environment
author_sort Pai-Yu Lin
title Updating Frequent Itemsets Without Rescanning the Original Database in the Incremental Environment
title_short Updating Frequent Itemsets Without Rescanning the Original Database in the Incremental Environment
title_full Updating Frequent Itemsets Without Rescanning the Original Database in the Incremental Environment
title_fullStr Updating Frequent Itemsets Without Rescanning the Original Database in the Incremental Environment
title_full_unstemmed Updating Frequent Itemsets Without Rescanning the Original Database in the Incremental Environment
title_sort updating frequent itemsets without rescanning the original database in the incremental environment
publishDate 2009
url http://ndltd.ncl.edu.tw/handle/25629406614318367928
work_keys_str_mv AT paiyulin updatingfrequentitemsetswithoutrescanningtheoriginaldatabaseintheincrementalenvironment
AT línbǎiyòu updatingfrequentitemsetswithoutrescanningtheoriginaldatabaseintheincrementalenvironment
AT paiyulin yǐbùxūzhòngxīncúnqǔzīliàokùdefāngshìláiyǒuxiàotànkāndòngtàizīliàokùzhōngdepínfánxiàngmùjí
AT línbǎiyòu yǐbùxūzhòngxīncúnqǔzīliàokùdefāngshìláiyǒuxiàotànkāndòngtàizīliàokùzhōngdepínfánxiàngmùjí
_version_ 1718254064764977152