Mining Fault-Tolerant Frequent Patterns in Large Databases

碩士 === 國立交通大學 === 資訊工程系 === 90 === In view of real world data may be interfered with noise which leads data to contain faults. The data mining methods proposed previously may not be applicable. Besides, we may hope that the knowledge discovered is more general and can be applied to find m...

Full description

Bibliographic Details
Main Authors: Sheng-Shun Wang, 王聖舜
Other Authors: Suh-Yin Lee
Format: Others
Language:en_US
Published: 2002
Online Access:http://ndltd.ncl.edu.tw/handle/84272332857502794908
Description
Summary:碩士 === 國立交通大學 === 資訊工程系 === 90 === In view of real world data may be interfered with noise which leads data to contain faults. The data mining methods proposed previously may not be applicable. Besides, we may hope that the knowledge discovered is more general and can be applied to find more interesting information. Hence, FT-Aprori was proposed for fault-tolerant data mining to discover information over large real-world data. However, FT-Apriori which generates and tests candidates based on Apriori property is not so efficient. In this paper, we develop memory-based algorithm FTP-mine which is based on the concept of pattern growth to mine fault-tolerant frequent patterns efficiently. In FTP-mine, the table, STable, is designed to count the item support and FT-support of the k-length patterns which have the same prefix of length k-1 by comparing transaction once. As to mining in a large database which is too large to fit in memory, FTP-mine also can be adopted by means of database partition. In addition, since there might exist a large number of fault tolerant frequent patterns and some may be contained in others, we also focus on the finding of maximal FT-frequent patterns by extending the FTP-mine algorithm. Our study shows that FTP-mine has higher performance than FT-Apriori in all kinds of parameter settings, such as various supports, tolerance, and scalability. The empirical evaluations show that the proposed method has good linear scalability and outperforms FT-Apriori in the discovery of FT-frequent pattern.